0% found this document useful (0 votes)
170 views

Agile Data Warehouse Design Sampler

This book describes an agile approach called BEAM for dimensional modeling. BEAM improves communication between data warehouse designers, business intelligence stakeholders, and the development team. It provides tools that encourage thinking dimensionally from the start. Readers will learn how to capture BI requirements and create high performance dimensional models through brainstorming sessions with stakeholders. The result is designers and developers understand how to implement dimensional modeling solutions while stakeholders feel ownership over the data warehouse they helped create.

Uploaded by

Adhie_x
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views

Agile Data Warehouse Design Sampler

This book describes an agile approach called BEAM for dimensional modeling. BEAM improves communication between data warehouse designers, business intelligence stakeholders, and the development team. It provides tools that encourage thinking dimensionally from the start. Readers will learn how to capture BI requirements and create high performance dimensional models through brainstorming sessions with stakeholders. The result is designers and developers understand how to implement dimensional modeling solutions while stakeholders feel ownership over the data warehouse they helped create.

Uploaded by

Adhie_x
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Agile

Data Warehouse
Design
Dimensional Modeling,
from to Star Schema
Lawrence Corr
with Jim Stagnitto

e r
pl
a m
S
business  intelligence  (DW/BI)  requirements  and  turning  them  into  high  performance  dimensional  models  in  the  most  
direct  way:  by  modelstorming  (data  modeling  +  brainstorming)  with  BI  stakeholders.
Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing/
business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the
This
This  book  describes  BEAM✲,  an  agile  approach  to  dimensional  modeling,  for  improving  communication  between  
most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders.
data   warehouse   designers,   BI   stakeholders   and   the   whole   DW/BI   development   team.   BEAM✲   provides   tools            
and   techniques  
This bookthat   will   encourage  
describes BEAM✲, DW/BI  
an agiledesigners  
approach and   developers  modeling,
to dimensional to   move  for
away   from   their  
improving keyboards   and        
communication
between data
entity   relationship   warehouse
based   designers,
tools   and   model  BIinteractively  
stakeholders with  
and their  
the whole DW/BI development
colleagues.   The   result   team. BEAM✲thinks              
is   everyone  
provides tools and techniques that will encourage DW/BI designers and developers to move away from their
dimensionally   from   the   outset!   Developers   understand   how   to   efficiently   implement   dimensional   modeling              
keyboards and entity relationship based tools and model interactively with their colleagues. The result is
solutions.   Business   stakeholders   feel   ownership   of   the   data   warehouse   they   have   created,   and   can   already              
everyone thinks dimensionally from the outset! Developers understand how to efficiently implement dimensional
imagine  how  they  will  use  it  to  answer  their  business  questions.
modeling solutions. Business stakeholders feel ownership of the data warehouse they have created, and can
already imagine how they will use it to answer their business questions.
Within  this  book,  you  will  learn:
Within this book, you will learn:
✲    Agile dimensional modeling using Business Event Analysis & Modeling (BEAM✲)
✲ Agile dimensional modeling using Business Event Analysis & Modeling (BEAM✲)
✲    Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun!
✲ Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun!
✲    Telling dimensional data stories using the 7Ws (who,  what,  when,  where,  how  many,  why and how)
✲ Telling dimensional data stories using the 7Ws (who, what, when, where, how many, why and how)
✲    Modeling by example not abstraction;; using data story themes, not crow’s feet, to describe detail
✲ Modeling by example not abstraction; using data story themes, not crow’s feet, to describe detail
✲    Storyboarding the data warehouse to discover conformed dimensions and plan iterative development
✲ Storyboarding the data warehouse to discover conformed dimensions and plan iterative development
✲    Visual modeling: sketching timelines, charts and grids to model complex process measurement – simply
✲ Visual modeling: sketching timelines, charts and grids to model complex process measurement – simply
✲    Agile design documentation: enhancing star schemas with BEAM✲  dimensional shorthand notation
✲ Agile design documentation: enhancing star schemas with BEAM✲ dimensional shorthand notation
✲    Solving difficult DW/BI performance and usability problems with proven dimensional design patterns
✲ Solving difficult DW/BI performance and usability problems with proven dimensional design patterns

Lawrence Corr   is   a   data   warehouse   designer   and   educator.  As   Principal   of   DecisionOne      
Consulting,   he   helps  Corr
Lawrence is to  
clients   a data warehouse
review   designertheir  
and   simplify   and educator. As Principaldesigns,  
data   warehouse   of DecisionOne
and   advises              
vendors   Consulting,
on   visual   he helps
data   organizations
modeling   to reviewHe  
techniques.   andregularly  
simplify their data warehouse
teaches   designs, and
agile   dimensional   modeling    
advises on visual data modeling techniques. He regularly holds agile modeling workshops
courses  worldwide  and  has  taught  dimensional  DW/BI  skills  to  thousands  of  students.
worldwide and has taught dimensional DW/BI skills to thousands of business/IT professionals.
Jim Stagnitto  is  a  data  warehouse  and  master  data  management  architect  specializing  in  the        
Jimfinancial  
Stagnitto

 
healthcare,   is a data
services,   and  warehouse andservice  
information   master data management
industries.   architect
He   is   specializing
the   founder   in data        
of   the  
the healthcare, financial services, and information
warehousing  and  data  mining  consulting  firm  Llumino.   service industries. He is the founder of the
data warehousing and data mining consulting firm Llumino.

decisionone.co.uk
 BEAM✲
modelstorming.com
Agile Data Warehouse Design
Collaborative Dimensional Modeling,
from Whiteboard to Star Schema

Lawrence Corr
with Jim Stagnitto
Agile Data Warehouse Design
by Lawrence Corr with Jim Stagnitto

Copyright © 2011, 2012, 2013 by Lawrence Corr. All rights reserved.

No part of this book may be reproduced in any form or by any electronic or mechanical means including informa-
tion presentation, storage and retrieval systems, without permission in writing from the copyright holder. The only
exception is by a reviewer, who may quote short excerpts in a review.

This eBook is free of copy protection or functionality restrictions. You may view or print it for
personal use as you see fit. You may make copies for your own personal use (e.g., one for use while
traveling and one on a home computer or backup device) but you may not give the eBook file to
other people. The file is personalized with an email address on the cover and other identifying
information and belongs to that email user. Ownership can not be transferred or sold. You may print the eBook but
it has been formatted specifically for on-screen viewing with no blank pages so margins and facing pages will not
print correctly. Generally it is cheaper and more efficient to order a paperback copy than print out the entire book.

Published by DecisionOne Press, Burwood House, Leeds LS28 7UJ, UK.


Email: [email protected], Tel: +44 7971 964824.

Proofing: Laurence Hesketh, Geoff Hurrell Illustrators: Gill Guile and Lawrence Corr
Cover Design: After Escher
Printing History:
November 2011: First Edition. January 2012: Revision. October 2012: Revision. May 2013: Revision.

Non-Printing History:
May 2013: eBook First Edition.

Displayed on recycled pixels

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks.
Where those designations appear in this book, and DecisionOne Press was aware of a trademark claim, the
designations have been printed in caps or initial caps.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties,
including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended
by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation.
Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or
Website is referred to in this work as a citation and/or a potential source of further information does not mean that
the author or the publisher endorses the information the organization or Website may provide or recommendations
it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or
disappeared between when this work was written and when it is read.

ISBN: 978-0-9568172-0-4 [2013-05-10]


ABOUT THE AUTHORS
Lawrence Corr is a data warehouse designer and educator. As Principal of DecisionOne Consulting, he
helps organizations to improve their Business Intelligence systems through the use of visual data model-
ing techniques. He regularly teaches agile dimensional modeling courses worldwide and has taught
DW/BI skills to thousands of IT professionals since 2000.

Lawrence has developed and reviewed data warehouses for healthcare, telecommunications, engineering,
broadcasting, financial services and public service organizations. He held the position of data warehouse
practice leader at Linc Systems Corporation, CT, USA and vice-president of data warehousing products
at Teleran Technologies, NJ, USA. Lawrence was also a Ralph Kimball Associate and has taught data
warehousing classes for Kimball University in Europe and South Africa and contributed to Kimball
articles and design tips. He lives in Yorkshire, England with his wife Mary and daughter Aimee. Law-
rence can be contacted at:

[email protected]

Jim Stagnitto is a data warehouse and master data management architect specializing in the healthcare,
financial services and information service industries. He is the founder of the data warehousing and data
mining consulting firm Llumino.

Jim has been a guest contributor for Ralph Kimball’s Intelligent Enterprise column, and a contributing
author to Ralph Kimball & Joe Caserta’s The Data Warehouse ETL Toolkit. He lives in Bucks County,
PA, USA with his wife Lori, and their happy brood of pets. Jim can be contacted at:

[email protected]
ACKNOWLEDGEMENTS
We would like to express our gratitude to everyone who made this book about BEAM✲ possible
(using BEAM✲ notation):
CONTENTS
INTRODUCTION ................................................................................................................................. XVII!
PART I: MODELSTORMING ................................................................................................................... 1!
CHAPTER 1
HOW TO MODEL A DATA WAREHOUSE.............................................................................................. 3!
OLTP VS. DW/BI: TWO DIFFERENT WORLDS ......................................................................................... 4!
The Case Against Entity-Relationship Modeling .......................................................................... 5!
Advantages of ER Modeling for OLTP ...................................................................................... 6!
Disadvantages of ER Modeling for Data Warehousing............................................................. 6!
The Case For Dimensional Modeling........................................................................................... 7!
Star Schemas............................................................................................................................ 8!
Fact and Dimension Tables ...................................................................................................... 8!
Advantages of Dimensional Modeling for Data Warehousing................................................... 9!
DATA WAREHOUSE ANALYSIS AND DESIGN .......................................................................................... 11!
Data-Driven Analysis.................................................................................................................. 11!
Reporting-Driven Analysis.......................................................................................................... 12!
Proactive DW/BI Analysis and Design ....................................................................................... 13!
Benefits of Proactive Design for Data Warehousing ............................................................... 14!
Challenges of Proactive Analysis for Data Warehousing........................................................ 15!
Proactive Reporting-Driven Analysis Challenges.................................................................... 15!
Proactive Data-Driven Analysis Challenges............................................................................ 15!
Data then Requirements: a ‘Chicken or the egg’ Conundrum................................................. 15!
Agile Data Warehouse Design ................................................................................................... 16!
Agile Data Modeling ................................................................................................................... 17!
Agile Dimensional Modeling....................................................................................................... 18!
Agile Dimensional Modeling and Traditional DW/BI Analysis ................................................. 19!
Agile Data-Driven Analysis...................................................................................................... 19!
Agile Reporting-Driven Analysis.............................................................................................. 19!
Requirements for Agile Dimensional Modeling ....................................................................... 19!
BEAM✲ ............................................................................................................................................ 21!
Data Stories and the 7Ws Framework ....................................................................................... 21!
Diagrams and Notation .............................................................................................................. 21!
BEAM✲ (Example Data) Tables ............................................................................................. 21!
BEAM✲ Short Codes.............................................................................................................. 22!
Comparing BEAM✲ and Entity-Relationship Diagrams.......................................................... 22!
Data Model Types ................................................................................................................... 23!
BEAM✲ Diagram Types ......................................................................................................... 24!
SUMMARY .......................................................................................................................................... 26!

CHAPTER 2
MODELING BUSINESS EVENTS.......................................................................................................... 27!
DATA STORIES ................................................................................................................................... 28!
Story Types ................................................................................................................................ 28!
Discrete Events ....................................................................................................................... 29!
Evolving Events....................................................................................................................... 29!
X Agile Data Warehouse Design

Recurring Events..................................................................................................................... 29!


Events and Fact Tables .......................................................................................................... 30!
The 7Ws..................................................................................................................................... 31!
Thinking Dimensionally ........................................................................................................... 31!
Using the 7Ws: BEAM✲ Sequence ........................................................................................ 32!
BEAM✲ IN ACTION: TELLING STORIES ................................................................................................ 33!
1. Discover an Event: Ask “Who Does What?” .......................................................................... 33!
Focus on One Event at a Time ............................................................................................... 34!
Identifying the Responsible Subject ........................................................................................ 34!
2. Document the Event: BEAM✲ Table ..................................................................................... 35!
3. Describe the Event: Using the 7Ws ....................................................................................... 37!
When?..................................................................................................................................... 37!
Collecting Event Stories .......................................................................................................... 38!
Event Story Themes................................................................................................................ 39!
Typical Stories......................................................................................................................... 40!
Different Stories ...................................................................................................................... 40!
Repeat Stories ........................................................................................................................ 40!
Missing Stories........................................................................................................................ 41!
Group Stories .......................................................................................................................... 42!
Additional When Details? ........................................................................................................ 43!
Determining the Story Type .................................................................................................... 44!
Recurring Event ...................................................................................................................... 44!
Evolving Event ........................................................................................................................ 44!
Discrete Event......................................................................................................................... 45!
Who?....................................................................................................................................... 45!
What?...................................................................................................................................... 46!
Where?.................................................................................................................................... 46!
How Many? ............................................................................................................................. 50!
Unit of Measure....................................................................................................................... 51!
Durations................................................................................................................................. 51!
Derived Quantities................................................................................................................... 52!
Why? ....................................................................................................................................... 52!
How? ....................................................................................................................................... 53!
Event Granularity .................................................................................................................... 54!
Sufficient Detail ....................................................................................................................... 55!
Naming the Event.................................................................................................................... 55!
Completing the Event Documentation..................................................................................... 56!
THE NEXT EVENT? ............................................................................................................................. 57!
SUMMARY .......................................................................................................................................... 58!

CHAPTER 3
MODELING BUSINESS DIMENSIONS ................................................................................................. 59!
DIMENSIONS ....................................................................................................................................... 60!
Dimension Stories ...................................................................................................................... 60!
Discovering Dimensions............................................................................................................. 61!
DOCUMENTING DIMENSIONS ................................................................................................................ 62!
Dimension Subject ..................................................................................................................... 62!
Dimension Granularity and Business Keys ................................................................................ 63!
DIMENSIONAL ATTRIBUTES .................................................................................................................. 64!
Attribute Scope........................................................................................................................... 64!
Contents XI

Attribute Examples ..................................................................................................................... 67!


Descriptive Attributes .............................................................................................................. 68!
Mandatory Attributes ............................................................................................................... 69!
Missing Values ........................................................................................................................ 69!
Exclusive Attributes and Defining Characteristics...................................................................... 70!
Using the 7Ws to Discover Attributes......................................................................................... 71!
DIMENSIONAL HIERARCHIES ................................................................................................................ 73!
Why Are Hierarchies Important? ................................................................................................ 73!
Hierarchy Types ......................................................................................................................... 75!
Balanced Hierarchies .............................................................................................................. 75!
Ragged Hierarchies ................................................................................................................ 76!
Variable Depth Hierarchies ..................................................................................................... 76!
Multi-Parent Hierarchies.......................................................................................................... 77!
Hierarchy Charts ........................................................................................................................ 77!
Modeling Hierarchy Types ...................................................................................................... 78!
Modeling Queries .................................................................................................................... 79!
Discovering Hierarchical Attributes and Levels.......................................................................... 80!
Hierarchy Attributes at the Same Level................................................................................... 82!
Hierarchy Attributes that Don’t Belong .................................................................................... 82!
Hierarchy Attributes at the Wrong Level ................................................................................. 82!
Completing a Hierarchy.............................................................................................................. 83!
DIMENSIONAL HISTORY ....................................................................................................................... 84!
Current Value Attributes............................................................................................................. 84!
Corrections and Fixed Value Attributes...................................................................................... 85!
Historic Value Attributes............................................................................................................. 86!
Telling Change Stories............................................................................................................... 86!
Documenting CV Change Stories ........................................................................................... 88!
Documenting HV Change Stories ........................................................................................... 88!
Business Keys and Change Stories........................................................................................ 89!
Detecting Corrections: Group Change Rules.......................................................................... 89!
Effective Dating ....................................................................................................................... 91!
Documenting the Dimension Type ............................................................................................. 91!
Minor Events .............................................................................................................................. 92!
HV Attributes: Dimension-Only Minor Events ......................................................................... 92!
Minor Events within Major Events ........................................................................................... 93!
SUFFICIENT ATTRIBUTES?................................................................................................................... 93!
SUMMARY .......................................................................................................................................... 94!

CHAPTER 4
MODELING BUSINESS PROCESSES.................................................................................................. 95!
MODELING MULTIPLE EVENTS WITH AGILITY ........................................................................................ 96!
Conformed Dimensions.............................................................................................................. 97!
The Data Warehouse Bus........................................................................................................ 100!
The Event Matrix ...................................................................................................................... 102!
Event Sequences ..................................................................................................................... 104!
Time/Value Sequence........................................................................................................... 104!
Process Sequence ................................................................................................................ 104!
Modeling Process Sequences as Evolving Events ............................................................... 105!
Using Process Sequences to Enrich Events......................................................................... 105!
MODELSTORMING WITH AN EVENT MATRIX ......................................................................................... 105!
XII Agile Data Warehouse Design

Adding the First Event to the Matrix ......................................................................................... 106!


Modeling the Next Event .......................................................................................................... 107!
Role-Playing Dimensions ......................................................................................................... 108!
Discovering Process Sequences ............................................................................................. 112!
Using the Matrix to Find Missing Events .................................................................................. 115!
Using the Matrix to Find Missing Event Details........................................................................ 116!
PLAYING THE EVENT RATING GAME ................................................................................................... 116!
MODELING THE NEXT DETAILED EVENT .............................................................................................. 120!
Reusing Conformed Dimensions and Examples...................................................................... 120!
Using Abbreviations in Event Stories .................................................................................... 121!
Adding New Examples to Conformed Dimensions ............................................................... 122!
Modeling New Details and Dimensions.................................................................................... 122!
Completing the Event............................................................................................................... 125!
SUFFICIENT EVENTS? ....................................................................................................................... 126!
SUMMARY ........................................................................................................................................ 128!

CHAPTER 5
MODELING STAR SCHEMAS............................................................................................................. 129!
AGILE DATA PROFILING ..................................................................................................................... 130!
Identifying Candidate Data Sources......................................................................................... 131!
Data Profiling Techniques ........................................................................................................ 132!
Missing Values ...................................................................................................................... 132!
Unique Values and Frequency.............................................................................................. 133!
Data Ranges and Lengths .................................................................................................... 133!
Automating Your Own Data Profiling Checks ....................................................................... 134!
No Source Yet: Proactive DW/BI Design ................................................................................. 134!
Annotating the Model with Data Profiling Results .................................................................... 135!
Data Sources and Data Types .............................................................................................. 135!
Additional Data...................................................................................................................... 137!
Unavailable Data................................................................................................................... 137!
Nulls and Mismatched Attribute Descriptions........................................................................ 137!
MODEL REVIEW AND SPRINT PLANNING ............................................................................................. 138!
Team Estimating ...................................................................................................................... 138!
Running a Model Review ......................................................................................................... 139!
Sprint Planning......................................................................................................................... 140!
STAR SCHEMA DESIGN ..................................................................................................................... 141!
Adding Keys to a Dimensional Model ...................................................................................... 141!
Choosing Primary Keys: Business Keys vs. Surrogate Keys................................................ 141!
Benefits of Data Warehouse Surrogate Keys ....................................................................... 142!
Insulate the Data Warehouse from Business Key Change ................................................... 143!
Cope with Multiple Business Keys for a Dimension .............................................................. 143!
Track Dimensional History Efficiently.................................................................................... 143!
Handle Missing Dimensional Values..................................................................................... 143!
Support Multi-Level Dimensions ........................................................................................... 144!
Protect Sensitive Information ................................................................................................ 144!
Reduce Fact Table Size........................................................................................................ 144!
Improve Query Performance................................................................................................. 145!
Enforce Referential Integrity Efficiently ................................................................................. 145!
Slowly Changing Dimensions................................................................................................... 146!
Overwriting History: Type 1 Slowly Changing Dimensions ................................................... 147!
Contents XIII

Tracking History: Type 2 Slowly Changing Dimensions........................................................ 147!


Current Values or Historical Values? Why Not Both? ........................................................... 148!
Updating the Dimensions ......................................................................................................... 149!
Adding Surrogate Keys ......................................................................................................... 149!
ETL and Audit Attributes ....................................................................................................... 150!
Time Dimensions ..................................................................................................................... 151!
Modeling Fact Tables............................................................................................................... 152!
Replace Event Details with Dimension Foreign Keys ........................................................... 152!
Modeling Degenerate Dimensions ........................................................................................ 153!
Modeling Facts...................................................................................................................... 153!
Drawing Enhanced Star Schema Diagrams............................................................................. 154!
Create a Separate Diagram for Each Fact Table.................................................................. 154!
Use a Consistent Star Schema Layout ................................................................................. 155!
Display BEAM✲ Short Codes on Star Schemas .................................................................. 155!
Avoid the Snowflake Schema Anti-pattern............................................................................ 156!
Do Create Rollup Dimensions............................................................................................... 157!
CREATING PHYSICAL SCHEMAS ......................................................................................................... 157!
Choose BI-Friendly Naming Conventions ................................................................................ 158!
Use Data Domains ................................................................................................................... 158!
PROTOTYPING THE DW/BI DESIGN .................................................................................................... 159!
THE DATA WAREHOUSE MATRIX ........................................................................................................ 160!
SUMMARY ........................................................................................................................................ 162!

PART II: DIMENSIONAL DESIGN PATTERNS .................................................................................. 163!


CHAPTER 6
WHO AND WHAT: DESIGN PATTERNS FOR PEOPLE AND ORGANIZATIONS, PRODUCTS AND
SERVICES ........................................................................................................................................... 165!
CUSTOMER DIMENSIONS ................................................................................................................... 166!
Mini-Dimension Pattern............................................................................................................ 166!
Sensible Snowflaking Pattern .................................................................................................. 170!
Swappable Dimension Patterns ............................................................................................... 172!
Customer Relationships: Embedded Whos ............................................................................. 173!
Recursive Relationship ......................................................................................................... 174!
Variable-Depth Hierarchies ................................................................................................... 175!
Hierarchy Map Pattern ............................................................................................................. 176!
Hierarchy Maps and Type 2 Slowly Changing Dimensions .................................................. 179!
Using a Hierarchy Map.......................................................................................................... 179!
Displaying a Hierarchy .......................................................................................................... 180!
Hierarchy Sequence.............................................................................................................. 181!
Drilling Down on Hierarchy Maps.......................................................................................... 182!
Querying Multiple Parents..................................................................................................... 182!
Building Hierarchy Maps ....................................................................................................... 183!
Tracking History for Variable-Depth Hierarchies................................................................... 183!
Historical Value Recursive Keys ........................................................................................... 184!
The Recursive Key Ripple Effect .......................................................................................... 184!
Ripple Effect Benefits............................................................................................................ 185!
Ripple Effect Problems.......................................................................................................... 185!
EMPLOYEE DIMENSIONS.................................................................................................................... 186!
Hybrid SCD View Pattern......................................................................................................... 186!
XIV Agile Data Warehouse Design

Previous Value Attribute Pattern .............................................................................................. 188!


Human Resources Hierarchies ................................................................................................ 189!
Multi-Valued Hierarchy Map Pattern ........................................................................................ 190!
Additional Multi-Valued Hierarchy Map Attributes................................................................. 191!
Handling Multiple Weighting Factors..................................................................................... 192!
Updating a Hierarchy Map ....................................................................................................... 192!
Historical Multi-Valued Hierarchy Maps ................................................................................... 193!
PRODUCT AND SERVICE DIMENSIONS ................................................................................................ 195!
Describing Heterogeneous Products ....................................................................................... 196!
Balancing Ragged Product Hierarchies ................................................................................... 196!
Multi-Level Dimension Pattern ................................................................................................. 198!
Parts Explosion Hierarchy Map Pattern ................................................................................... 201!
SUMMARY ........................................................................................................................................ 202!

CHAPTER 7
WHEN AND WHERE: DESIGN PATTERNS FOR TIME AND LOCATION ........................................ 203!
TIME DIMENSIONS............................................................................................................................. 204!
Calendar Dimensions............................................................................................................... 205!
Date Keys.............................................................................................................................. 206!
ISO Date Keys ...................................................................................................................... 207!
Epoch-Based Date Keys ....................................................................................................... 207!
Populating the Calendar........................................................................................................ 208!
BI Tools and Calendar Dimensions....................................................................................... 208!
Period Calendars ..................................................................................................................... 209!
Month Dimensions ................................................................................................................ 209!
Offset Calendars ................................................................................................................... 210!
Year-to-Date Comparisons ...................................................................................................... 210!
Fact-Specific Calendar Pattern ............................................................................................. 212!
Using Fact State Information in Report Footers.................................................................... 213!
Conformed Date Ranges ...................................................................................................... 214!
CLOCK DIMENSIONS ......................................................................................................................... 214!
Day Clock Pattern - Date and Time Relationships................................................................... 215!
Time Keys ................................................................................................................................ 216!
INTERNATIONAL TIME ........................................................................................................................ 217!
Multinational Calendar Pattern................................................................................................. 218!
Date Version Keys ................................................................................................................... 220!
INTERNATIONAL TRAVEL .................................................................................................................... 221!
Time Dimensions or Time Facts? ............................................................................................ 224!
NATIONAL LANGUAGE DIMENSIONS .................................................................................................... 225!
National Language Calendars.................................................................................................. 225!
Swappable National Language Dimensions Pattern................................................................ 225!
SUMMARY ........................................................................................................................................ 226!

CHAPTER 8
HOW MANY: DESIGN PATTERNS FOR HIGH PERFORMANCE FACT TABLES AND FLEXIBLE
MEASURES ......................................................................................................................................... 227!
FACT TABLE TYPES .......................................................................................................................... 228!
Transaction Fact Table ............................................................................................................ 228!
Contents XV

Periodic Snapshot .................................................................................................................... 229!


Accumulating Snapshots.......................................................................................................... 231!
FACT TABLE GRANULARITY ............................................................................................................... 233!
MODELING EVOLVING EVENTS ........................................................................................................... 233!
Evolving Event Measures......................................................................................................... 237!
Event Counts......................................................................................................................... 237!
State Counts ......................................................................................................................... 237!
Durations............................................................................................................................... 238!
Additional Process Performance Measures .......................................................................... 238!
Event Timelines........................................................................................................................ 238!
Using Timelines for Documentation ...................................................................................... 240!
Using Timelines for Business Intelligence............................................................................. 240!
Developing Accumulating Snapshots....................................................................................... 241!
FACT TYPES ..................................................................................................................................... 242!
Fully Additive Facts .................................................................................................................. 243!
Non-Additive Facts................................................................................................................... 243!
Semi-Additive Facts ................................................................................................................. 244!
Averaging Issues................................................................................................................... 244!
Counting Issues .................................................................................................................... 245!
Heterogeneous Facts Pattern .................................................................................................. 246!
Factless Fact Pattern ............................................................................................................... 248!
FACT TABLE OPTIMIZATION ............................................................................................................... 249!
Downsizing............................................................................................................................... 249!
Indexing.................................................................................................................................... 250!
Partitioning ............................................................................................................................... 251!
Aggregation.............................................................................................................................. 252!
Lost Dimension Aggregate Pattern ....................................................................................... 252!
Shrunken Dimension Aggregate Pattern............................................................................... 253!
Collapsed Dimension Aggregate Pattern .............................................................................. 254!
Aggregation Guidelines......................................................................................................... 254!
Drill-Across Query Pattern ....................................................................................................... 255!
Derived Fact Table Patterns .................................................................................................... 258!
SUMMARY ........................................................................................................................................ 260!

CHAPTER 9
WHY AND HOW: DESIGN PATTERNS FOR CAUSE AND EFFECT ................................................ 261!
WHY DIMENSIONS............................................................................................................................. 262!
Internal Why Dimensions ......................................................................................................... 262!
Unstructured Why Dimensions................................................................................................. 263!
External Why Dimensions ........................................................................................................ 264!
MULTI-VALUED DIMENSIONS ............................................................................................................. 265!
Weighting Factor Pattern ......................................................................................................... 265!
Modeling Multi-Valued Groups................................................................................................. 267!
Multi-Valued Bridge Pattern ..................................................................................................... 268!
Optional Bridge Pattern............................................................................................................ 270!
Pivoted Dimension Pattern....................................................................................................... 273!
HOW DIMENSIONS ............................................................................................................................ 276!
Too Many Degenerate Dimensions?........................................................................................ 277!
Creating How Dimensions........................................................................................................ 277!
Range Band Dimension Pattern............................................................................................... 278!
XVI Agile Data Warehouse Design

Step Dimension Pattern ........................................................................................................... 279!


Audit Dimension Pattern .......................................................................................................... 281!
SUMMARY ........................................................................................................................................ 283!

APPENDIX A: THE AGILE MANIFESTO ............................................................................................ 285!


MANIFESTO FOR AGILE SOFTWARE DEVELOPMENT ............................................................................. 285!
THE TWELVE PRINCIPLES OF AGILE SOFTWARE .................................................................................. 285!
APPENDIX B: BEAM✲ NOTATION AND SHORT CODES ............................................................... 287!
TABLE CODES .................................................................................................................................. 287!
COLUMN CODES ............................................................................................................................... 289!
APPENDIX C: RESOURCES FOR AGILE DIMENSIONAL MODELERS........................................... 293!
TOOLS: HARDWARE AND SOFTWARE .................................................................................................. 293!
BOOKS ............................................................................................................................................. 294!
Agile Software Development................................................................................................. 294!
Visual Thinking, Collaboration and Facilitation ..................................................................... 294!
Dimensional Modeling........................................................................................................... 294!
Dimensional Modeling Case Studies .................................................................................... 294!
ETL........................................................................................................................................ 294!
Database Technology–Specific Advice................................................................................. 294!
WEBSITES ........................................................................................................................................ 295!
INTRODUCTION
Dimensional modeling, since it was first popularized by Ralph Kimball in the mid- Dimensional
1990s, has become the accepted (data modeling) technique for designing the high modeling is
performance data warehouses that underpin the success of today’s business intelli- responsible for
gence (BI) applications. Yet, with an ever increasing number of BI initiatives today’s DW/BI
stumbling long before they reach the data modeling phase, it has become clear that successes, yet we
Data Warehousing/Business Intelligence (DW/BI) needs new techniques that can still struggle to
revolutionize BI requirements analysis in the same way that dimensional modeling deliver enough BI
has revolutionized BI database design.

Agile, with its mantra of creating business value through the early and frequent Agile techniques
delivery of working software and responding to change, has had just such a revolu- can help, but they
tionary effect on the world of application development. Can it take on the chal- must address data
lenges of DW/BI? Agile’s emphasis on collaboration and incremental development warehouse design,
coupled with techniques such as Scrum and User Stories, will certainly improve BI not just BI
application development—once a data warehouse is in place. But to truly have an application
impact on DW/BI, agile must also address data warehouse design itself. Unfortu- development
nately, the agile approaches that have emerged, so far, are vague and non-
prescriptive in this one key area. For agile BI to be more than a marketing reboot of
business-as-usual business intelligence, it must be agile DW/BI and we, DW/BI
professionals, must do what every true agilist would recommend: adapt agile to
meet our needs while still upholding its values and principlFs (see Appendix A: The
Agile Manifesto). At the same time, agilists coming afresh to DW/BI, for their part,
must learn our hard-won data lessons.

With that aim in mind, this book introduces BEAM✲ (Business Event Analysis & This book is about
Modeling): a set of collaborative techniques for modelstorming BI data require- BEAM✲: an agile
ments and translating them into dimensional models on an agile timescale. We call approach to
the BEAM✲ approach “modelstorming” because it combines data modeling and dimensional
brainstorming techniques for rapidly creating inclusive, understandable models modeling
that fully engage BI stakeholders.

BEAM✲ modelers achieve this by asking stakeholders to tell data stories, using the BEAM✲ is used for
7W dimensional types—who, what, when, where, how many, why, and how—to modelstorming BI
describe the business events they need to measure. BEAM✲ models support mod- requirements
elstorming by differing radically from conventional entity-relationship (ER) based directly with BI
models. BEAM✲ uses tabular notation and example data stories to define business stakeholders
events in a format that is instantly recognizable to spreadsheet-literate BI
stakeholders, yet easily translated into atomic-detailed star schemas. By doing so,
BEAM✲ bridges the business-IT gap, creates consensus on data definitions and
generates a sense of business ownership and pride in the resulting database design.

XVII
XVIII Introduction

Who Is This Book For?


This book is for the This book is intended for data modelers, business analysts, data architects, and
whole agile DW/BI developers working on data warehouses and business intelligence systems. All
team, to help you members of an agile DW/BI team—not just those directly responsible for gathering
not only gather BI requirements or designing the data warehouse—will find the BEAM✲ notation
requirements but a powerful addition to standard entity-relationship diagrams for communicating
also communicate dimensional design ideas and estimating data tasks with their colleagues. To get the
design ideas most from this book, readers should have a basic knowledge of database concepts
such as tables, columns, rows, keys, and joins.

It is aimed at both For those new to data warehousing, this book provides a quick-study introduction
new and experienced to dimensional modeling techniques. For those of you who would like more
DW/BI practitioners. background on the techniques covered, the later chapters and Appendix C provide
It’s a quick-study references to case studies in other texts that will help you gain additional business
guide to dimensional insight. Experienced data warehousing professionals will find that this book offers
modeling and a a fresh perspective on familiar dimensional modeling patterns, covering many in
source of new more detail than previously available, and adding several new ones. For all readers,
dimensional design this book offers a radically new agile way of engaging with business users and kick-
patterns starting their next warehouse development project.

Meet The Modelstormers or How To Use This Book


Hello, I’m over here You may have already noticed the marginalia (non-contagious), on your left at the
and I’m your fast moment. This provides a “fast track” summary for readers in a hurry. This agile
track through this path through our text was inspired by David A. Taylor’s object technology series of
book books. The margins of this book also contain a cast of anything but marginal
characters. They are the modelstormers you need on your agile DW/BI team. We
used them to highlight key features in the text such as tips, warnings, references and
example modeling dialogues. They appear in the following order (in Chapters 1-9):

The bright modeler, not surprisingly, has some bright ideas. His tips, techniques and
practical modeling advice, distilled from the current topic, will help you improve
your design.

The experienced dimensional modeler has seen it all before. He’s here to warn you
when an activity or decision can steal your time, sanity or agility. Later in the book
he follows the pattern users (see below) to tell you about the consequences or side
effects of using their recommended design patterns. He would still recommend
you use their patterns though—just with a little care.
Introduction XIX

The note takers are the members of the team who always read the full instructions
before they use that new gadget or technique. They’re always here to tell you to
“make a note of that” when there is extra information on the current topic.

The agilists will let you know when we're being particularly agile. They wave their
banner whenever a design technique supports a core value of the agile manifesto or
principle of agile software development. These are listed in Appendix A.

The modelstormers appear en masse when we describe collaborative modeling and


team planning, particularly when we offer practical advice and tips on using white-
boards and other inclusive tools for modelstorming.

The scribe appears whenever we introduce new BEAM✲ diagrams, notation con-
ventions or short codes for rapidly documenting your designs. All the scribe’s short
codes are listed in Appendix B.

The agile modeler engages with stakeholders and facilitates modelstorming. She is
here to ask example BEAM✲ questions, using the 7Ws, to get stakeholders to tell
their data stories.

The stakeholders are the subject matter experts, operational IT staff, BI users and BI
consumers, who know the data sources, or know the data they want—anyone who
can help define the data warehouse who is not a member of the DW/BI develop-
ment team. They are here to provide example answers to the agile modeler’s ques-
tions, tell data stories and pose their own tricky BI questions.

The bookworm points you to further reading on the current topic. All her reading
recommendations are gathered in Appendix C.

The agile developer appears when we have some practical advice about using soft-
ware tools or there is something useful you can download.

The head scratcher has interesting/vexing DW/BI problems or requirements that


the data warehouse design is going to have to address.

The pattern users have a solution to the head scratcher’s problems. They’re going to
use tried and tested dimensional modeling design patterns, some new in print.
XX Introduction

How This Book Is Organized


This book has two parts. The first part covers agile dimensional modeling for BI
data requirements gathering, while the second part covers dimensional design
patterns for efficient and flexible star schema design.

Part I: Modelstorming
Part I describes how to modelstorm BI stakeholders’ data requirements, validate
Collaborative these requirements using agile data profiling, review and prioritize them with
modeling with BI stakeholders, estimate their ETL tasks as a team, and convert them into star sche-
stakeholders mas. It illustrates how agile data modeling can be used to replace traditional BI
requirements gathering with accelerated database design, followed by BI prototyp-
ing to capture the real reporting and analysis requirements. Chapter 1 provides an
introduction to dimensional modeling. Chapters 2 to 4 provide a step-by-step
guide for using BEAM✲ to model business events and dimensions. Chapter 5
describes how BEAM✲ models are validated and translated into physical dimen-
sional models and development sprint plans.

Chapter 1: How to Model a Data Warehouse


Why we need new Data warehouses and operational systems: Understanding the motivation for
agile approaches using dimensional modeling as the basis for agile database design.
for gathering BI Dimensional modeling fundamentals: Contrasting dimensional modeling with
requirements. Why entity-relationship (ER) modeling, and learning the basic concepts and vocabulary
they should be of facts, dimensions, and star schemas that will be used throughout the book.
dimensional. What Agile data modeling for analysis and design: The BI requirement gathering
they should look like problem. The challenges and opportunities of proactive DW/BI. The benefits of
agile data warehousing. Why model with BI stakeholders? The case for model-
storming: using agile dimensional modeling to gather BI data requirements.
Introduction to BEAM✲ : Comparison of BEAM✲ and ER diagrams.

Chapter 2: Modeling Business Events


Step-by-step Discovering business events: Using subjects, verbs, and objects to discover busi-
modeling of a ness events and tell data stories.
business event Documenting business events: Using whiteboards and spreadsheets and BEAM✲
using BEAM✲ tables to collaboratively model events.
Discovering event details: Using the 7Ws: who, what, when, where, how many,
why, and how to discover atomic-level event details. Using prepositions to connect
details to events, and data story themes to define and document them. Using
BEAM✲ short codes to document event story types (discrete, recurring, and evolv-
ing) and potential fact table granularity.
Introduction XXI

Chapter 3: Modeling Business Dimensions


Modeling “detail about detail”: Discovering dimensions and documenting their Step-by-step
attributes with stakeholders. Telling dimension stories and overcoming weak modeling of
narratives. dimensions and
Discovering dimensional hierarchies: Using hierarchy charts to model hierarchi- hierarchies
cal relationships and discover additional dimensional attributes.
Documenting historical value requirements: Using change stories and BEAM✲
short codes to define and document slowly changing dimension policies for sup-
porting current (as is) and historically correct (as was) analysis views.

Chapter 4: Modeling Business Processes


Modeling multiple business events: Modelstorming with an event matrix to Step-by-step
storyboard a data warehouse design by identifying and documenting the relation- modeling multiple
ships between events and dimensions. Using event stories to prioritize require- business events
ments and plan development sprints. and conformed
Modeling for agile data warehouse development: Defining and reusing con- dimensions
formed dimensions. Generalizing dimensions and documenting their roles. Sup-
porting incremental development and creating a data warehouse bus architecture.

Chapter 5: Modeling Star Schemas


Agile data profiling: Reviewing and adapting stakeholder models to data realities. Validating
Using BEAM✲ annotation to document data sources and physical data types, stakeholder models
provide feedback to stakeholders on model viability and help estimate ETL tasks as and converting them
a team. into star schemas
Converting BEAM✲ tables to star schemas: defining and using surrogate keys to
complete dimension tables, and convert event tables to fact tables. Using BEAM✲
technical codes to document the database design decisions and generate database
schemas using the BEAM✲Modelstormer spreadsheet. Prototyping to define BI
reporting requirements. Creating enhanced star schemas and physical dimensional
matrices for a technical audience.

Part II: Dimensional Design Patterns


Part II covers dimensional modeling techniques for designing high-performance
star schemas. For this, we take a design pattern approach using a combination of Collaborative
BEAM✲ and star schema ER notation to capture significant DW/BI requirements, modeling within the
explain their associated issues/problems, and document pattern solutions and the DW/BI team. Using
consequences of implementing them. We have organized these design patterns design patterns
around the 7W dimensional types discovered in Part I. By using the 7Ws to exam- associated with
ine the complexities of modeling customers and employees (who), products and each of the 7W
services (what), time (when), location (where), business measures (how many), dimensional types
XXII Introduction

cause (why), and effect (how), we document new and established dimensional
techniques from a dimensional perspective for the first time.

Chapter 6: Who and What: People and Organizations, Products and


Services
Design patterns for Modeling customers, employees, and organizations: Handling large, rapidly
customer, employee changing dimension populations. Tracking changes using mini-dimensions.
and product Mixed business models: Using exclusive attributes and swappable dimensions to
dimensions model heterogeneous customers (businesses and consumer) and products (tangible
goods and services).
Advanced slowly changing Patterns: Modeling micro and macro-level change.
Supporting simultaneous current, historical, and previous value reporting re-
quirements using hybrid SCD views.
Representing complex hierarchical relationships: Using hierarchy maps to handle
recursive hierarchies, such as customer ownership, employee HR reporting struc-
tures, and product composition (component bill of materials and product bun-
dles).
Supporting variation within business events: Using multi-level dimensions to
describe events with variable granularity such as sales transactions assigned to
individual employees or to teams, web advertisement impressions for single prod-
ucts or whole product categories.

Chapter 7: When and Where: Time and Location


Design patterns for Modeling time dimensionally: Using separate calendar and clock dimensions and
time and location defining date keys.
dimensions Year-to-date (YTD) analysis: Using fact state tables and fact-specific calendars to
support correct YTD comparisons.
Time of day bracketing: Designing custom business clocks that vary by day of
week or time of year.
Multinational calendars: Modeling multinational dimensions that cope with time
and location. Supporting time zones and national language reporting.
Modeling movement: Overloading events with additional time and location
dimensions to understand journeys and trajectories.

Chapter 8: How Many: Facts and Measures and KPIs


Design patterns for Designing fact tables for performance and ease of use: Defining the three basic
modeling efficient fact table patterns: transactions, periodic snapshots, and accumulating snapshots.
fact tables and Using event timelines to model accumulating snapshots as evolving events.
flexible facts Providing the basis for flexible measures and KPIs: Defining atomic-level addi-
tive facts. Documenting semi-additive and non-additive facts, and understanding
their limitations.
Fact table performance optimization: Using indexing, partitioning, and aggrega-
tion to improve fact table ETL and query performance.
Introduction XXIII

Cross-process analysis: Combining the results from multiple fact tables using
drill-across processing and multi-pass queries. Building derived fact tables and
consolidated data marts to simplify query processing.

Chapter 9: Why and How: Cause and Effect


Modeling causal factors: Using promotions, weather, and other causal dimensions Design patterns for
to explain why events occur and why facts vary. Using text dimensions to handle modeling cause and
unstructured reasons and exception descriptions. effect
Modeling event descriptions: Using how dimensions to collect any additional
descriptions of an event. Consolidating excessive degenerate dimensions as how
dimensions, and combining small why and how dimensions.
Multi-valued dimensions: Using bridge tables and weighting factors to handle fact
allocation (‘splitting the atom’) when dimensions have multiple values for each
atomic-level fact. Using optional bridge tables and multi-level dimensions to effi-
ciently handle barely multi-valued dimensions. Using pivoted dimensions to
support complex multi-valued constraints.
Providing additional how dimensions: Using step dimensions for understanding
sequential behavior, audit dimensions for tracking data quality/lineage, and range
band dimensions for treating facts as dimensions.

Appendix A: The Agile Manifesto


Appendix A lists the four values of, and the twelve principles behind, the manifesto
for agile software development.

Appendix B: BEAM✲ Table Notation and Short Codes


Appendix B summarizes the BEAM✲ notation used throughout this book for
modeling data requirements, recording data profiling results and representing
physical dimensional modeling design decisions.

Appendix C: Resources for Agile Dimensional Modelers


Appendix C lists books, websites, and tools (hardware and software) that will help
you adopt and adapt the ideas contained in the book.

Companion Website
Visit modelstorming.com to download the BEAM✲Modelstormer spreadsheet
and other templates that accompany this book. On the site you will find example
models and code listings together with links to articles, books, and the worldwide
schedule of training courses and workshops on BEAM✲ and agile data warehouse
design. Register your paperback copy online to receive a discounted eBook version.
PART I: MODELSTORMING
AGILE DIMENSIONAL MODELING, FROM WHITEBOARD TO STAR SCHEMA

Dimensional Modeling: it's too important to be left to data modelers alone


— Anon.

Chapter 1: How to Model a Data Warehouse

Chapter 2: Modeling Business Events

Chapter 3: Modeling Business Dimensions

Chapter 4: Modeling Business Processes

Chapter 5: Modeling Star Schemas


1
✲!

HOW TO MODEL A DATA WAREHOUSE


Essentially, all models are wrong, but some are useful.
— George E. P. Box

In this first chapter we set out the motivation for adopting an agile approach to Dimensional
data warehouse design. We start by summarizing the fundamental differences modeling supports
between data warehouses and online transaction processing (OLTP) databases to data warehouse
show why they need to be designed using very different data modeling techniques. design
We then contrast entity-relationship and dimensional modeling and explain why
dimensional models are optimal for data warehousing/business intelligence
(DW/BI). While doing so we also describe how dimensional modeling enables
incremental design and delivery: key principles of agile software development.

Readers who are familiar with the benefits of traditional dimensional modeling Collaborative
may wish to skip to Data Warehouse Analysis and Design on Page 11 where we begin dimensional
the case for agile dimensional modeling. There, we take a step back in the DW/BI modeling
development lifecycle and examine the traditional approaches to data requirements supports agile
analysis, and highlight their shortcomings in dealing with ever more complex data data warehouse
sources and aggressive BI delivery schedules. We then describe how agile data analysis and design
modeling can significantly improve matters by actively involving business
stakeholders in the analysis and design process. We finish by introducing BEAM✲
(Business Event Analysis and Modeling): the set of agile techniques for collabora-
tive dimensional modeling described throughout this book.

Differences between operational systems and data warehouses Chapter 1 Topics


Entity-relationship (ER) modeling vs. dimensional modeling At a Glance
Data-driven analysis and reporting requirements analysis limitations
Proactive data warehouse design challenges
Introduction to BEAM✲: an agile dimensional modeling method

3
4 Chapter 1

OLTP vs. DW/BI: Two Different Worlds


OLTP and DW/BI Operational systems and data warehouses have fundamentally different purposes.
have radically Operational systems support the execution of business processes, while data ware-
different DBMS houses support the evaluation of business processes. To execute efficiently, opera-
requirements tional systems must be optimized for online transaction processing (OLTP). In
contrast, data warehouses, must be optimized for query processing and ease of use.
Table 1-1 highlights the very different usage patterns and database management
system (DBMS) demands of the two types of system.

Table 1-1 CRITERIA OLTP DATABASE DATA WAREHOUSE


Comparison Purpose Execute individual Evaluate multiple business
between OLTP business processes processes
(“turning the handles”) (“watching the wheels turn”)
databases and
Transaction type Insert, select, update, Select
Data Warehouses
delete
Transaction style Predefined: predictable, Ad-hoc: unpredictable, volatile
stable
Optimized for Update efficiency and Query performance and
write consistency usability
Update frequency Real-time: when busi- Periodic, (daily) via scheduled
ness events occur ETL (extract, transform, load).
Moving to near real-time
Update High Low
concurrency
Historical data Current and recent Current + several years of
access periods history
Selection criteria Precise, narrow Fuzzy, broad
Comparisons Infrequent Frequent
Query complexity Low High
Tables/joins per Few (1–3) Many (10+)
transaction
Rows per Tens Millions
transaction
Transactions per Millions Thousands
day
Data volumes Gigabytes–Terabytes Terabytes–Petabytes
(many sources, history)

Data Mainly raw detailed Detailed data, summarized


data data, derived data

Design technique Entity-Relationship Dimensional modeling


modeling (normalization)

Data model ER diagram Star schema


diagram
How to Model a Data Warehouse 5

The Case Against Entity-Relationship Modeling


Entity-Relationship (ER) modeling is the standard approach to data modeling for ER modeling is
OLTP database design. It classifies all data as one of three things: an entity, a used to design
relationship, or an attribute. Figure 1-1 shows an example entity-level ER diagram OLTP databases
(ERD). Entities are shown as boxes and relationships as lines linking the boxes.
The cardinality of each relationship—the number of possible matching values on
either side of the relationship—is shown using crow’s feet for many, | for one, and
O for zero (also knowO as optionality).

Figure 1-1
Entity-Relationship
diagram (ERD)

Within a relational database, entities are implemented as tables and their attributes Entities become
as columns. Relationships are implemented either as columns within existing tables, attributes
tables or as additional tables depending on their cardinality. One-to-one (1:1) and become columns
many-to-one (M:1) relationships are implemented as columns, whereas many-to-
many (M:M) relationships are implemented using additional tables, creating
additional M:1 relationships.

ER modeling is associated with normalization in general, and third normal form ER models are
(3NF) in particular. ER modeling and normalization have very specific technical typically in third
goals: to reduce data redundancy and make explicit the 1:1 and M:1 relationships normal form (3NF)
within the data that can be enforced by relational database management systems.
6 Chapter 1

Advantages of ER Modeling for OLTP


3NF is efficient for Normalized databases with few, if any, data redundancies have one huge advantage
transaction for OLTP: they make write transactions (inserts, updates, and deletes) very effi-
processing cient. By removing data redundancies, transactions are kept as small and simple as
possible. For example, the repeat usage of a service by a telecom’s customer is
recorded using tiny references to the customer and service: no unnecessary details
are rerecorded each time. When a customer or service detail changes (typically)
only a single row in a single table needs to be updated. This helps avoid update
anomalies that would otherwise leave a database in an inconsistent state.

Higher forms of normalization are available, but most ER modelers are satisfied
when their models are in 3NF. There is even a mnemonic to remind everyone that
data in 3NF depends on “The key, the whole key, and nothing but the key, so help
me Codd”—in memory of Edgar (Ted) Codd, inventor of the relational model.

Disadvantages of ER Modeling for Data Warehousing


3NF is inefficient for Even though 3NF makes it easier to get data in, it has a huge disadvantage for BI
query processing and data warehousing: it makes it harder to get the data out. Normalization prolif-
erates tables and join paths making queries (SQL selects) less efficient and harder
to code correctly. For example, looking at the Figure 1-1 ERD, could you estimate
how many ways PRODUCT CATEGORY can be joined to ORDER
TRANSACTION? A physical 3NF version of the model would contain at least 20
more tables to resolve the M:M relationships. Faced with such 3NF databases, even
the simplest BI query requires multiple tables to be joined through multiple inter-
mediate tables. These long joins paths are difficult to optimize and queries invaria-
bly run slowly.

3NF models are More importantly, queries will only produce the right answers if users navigate the
difficult to right join paths, i.e., ask the right questions in SQL terms. If the wrong joins are
understand used, they unknowingly get answers to some other (potentially meaningless)
questions. 3NF models are complex for both people and machines. Specialist
hardware (data warehouse appliances) is improving query/join performance all the
time, but the human problems are far more difficult to solve. Smart BI software can
hide database schema complexity behind a semantic layer, but that merely moves
the burden of understanding a 3NF model from BI users at query time to BI
developers at configuration time. That’s a good move but its not enough. 3NF
models remain too complex for business stakeholders to review and quality assure
(QA).

History further ER models are further complicated by data warehousing requirements to track
complicates 3NF history in full to support valid ‘like-for-like’ comparisons over time. Providing a
true historical perspective of business events requires that many otherwise simple
descriptive attributes become time relationships, i.e., existing M:1 relationships
become M:M relationships that translate into even more physical tables and com-
How to Model a Data Warehouse 7

plex join paths. Such temporal database designs can defeat even the smartest BI
tools and developers.

Laying out a readable ERD for any non-trivial data model isn’t easy. The mne- Large readable ER
monic “dead crows fly east” encourages modelers to keep crows’ feet pointing up diagrams are
or to the left. Theoretically this should keep the high-volume volatile entities difficult to draw: all
(transactions) top left and the low-volume stable entities (lookup tables) bottom those overlapping
right. However, this layout seldom survives as modelers attempt to increase read- lines
ability by moving closely related or commonly used entities together. The task
rapidly descends into an exercise in trying to reduce overlapping lines. Most ERDs
are visually overwhelming for BI stakeholders and developers who need simpler,
human-scale diagrams to aid their communication and understanding.

The Case For Dimensional Modeling


Dimensional models define business processes and their individual events in terms Dimensional models
of measurements (facts) and descriptions (dimensions), which can be used to filter, appeal to
group, and aggregate the measurements. Data cubes are often used to visualize spreadsheet-savvy
simple dimensional models, as in Figure 1-2, which shows the multidimensional BI users
analysis of a sales process with three dimensions: PRODUCT (what), TIME
(when), and LOCATION (where). At the intersection of these dimensional values
there are interesting facts such as the quantity sold, sales revenue, and sales costs.
This perspective on the data appeals to many BI users because the three-
dimensional cube can be thought of as a stack of two-dimensional spreadsheets.
For example, one spreadsheet for each location contains rows for products, col-
umns for time periods, and revenue figures in each cell.

Figure 1-2
Multidimensional
analysis
8 Chapter 1

Star Schemas
Star schemas are Real-world dimensional models are used to measure far more complex business
used to visualize processes (with more dimensions) in far greater detail than could be attempted
dimensional models using spreadsheets. While it is difficult to envision models with more than three
dimensions as multi-dimensional cubes (they wouldn’t actually be cubes), they can
easily be represented using star schema diagrams. Figure 1-3 shows a classic star
schema for retail sales containing a fourth (causal) dimension: PROMOTION, in
addition to the dimensional attributes and facts from the previous cube example.

Figure 1-3
Sales star schema

Star schema is also the term used to describe the physical implementation of a
dimensional model as relational tables.

Star schema diagrams are non-normalized (N3NF) ER representations of dimen-


sional models. When drawn in a database modeling tool they can be used to
generate the SQL for creating fact and dimension tables in relational database
management systems. Star schemas are also used to document and define the data
cubes of multidimensional databases.

ER diagrams work best for viewing a small number of tables at one time. How
many tables? About as many as in a dimensional model: a star schema.

Fact and Dimension Tables


Star schemas are A star schema is comprised of a central fact table surrounded by a number of
comprised of fact dimension tables. The fact table contains facts: the numeric (quantitative) meas-
and dimension ures of a business event. The dimension tables contain mainly textual (qualitative)
tables descriptions of the event and provide the context for the measures. The fact table
also contains dimensional foreign keys; to an ER modeler it represents a M:M
How to Model a Data Warehouse 9

relationship between the dimensions. A subset of the dimensional foreign keys


form a composite primary key for the fact table and defines its granularity, or level
of detail.

The term dimension in this book refers to a dimension table whereas dimensional
attribute refers to a column in a dimension table.

Dimensions contain sets of descriptive (dimensional) attributes that are used to Dimensional
filter data and group facts for aggregation. Their role is to provide good report row hierarchies support
headers and title/heading/footnote filter descriptions. Dimensional attributes often drill-down analysis
have a hierarchical relationship that allows BI tools to provide drill-down analysis.
For example, drilling down from Quarter to Month, Country to Store, and Cate-
gory to Product.

Not all dimensional attributes are text. Dimensions can contain numbers and dates Dimensions are
too, but these are generally used like the textual attributes to filter and group the small,
facts rather than to calculate aggregate measures. Despite their width, dimensions fact tables
are tiny relative to fact tables. Most dimensions contain considerably less than a are large
million rows.

The most useful facts are additive measures that can be aggregated using any
combination of the available dimensions. The most useful dimensions provide
rich sets of descriptive attributes that are familiar to BI users.

Advantages of Dimensional Modeling for Data Warehousing


The most obvious advantage of a dimensional model, noticeable in Figure 1-3, is its Dimensional models
simplicity. The small number of tables and joins, coupled with the explicit facts in maximize query
the center of the diagram, makes it easy to think about how sales can be measured performance and
and easy to construct the necessary queries. For example, if BI users want to usability
explore product sales by store, only one short join path exists between PRODUCT
and STORE: through the SALES FACT table. Limiting the number of tables in-
volved and the length of the join paths in this way maximizes query performance
by leveraging DBMS features such as star-join optimization (which processes
multiple joins to a fact table in a single pass).

A deeper, less immediately obvious benefit of dimensional models is that they are Dimensional models
process-oriented. They are not just the result of some aggressive physical data are process-
model optimization (that has denormalized a logical 3NF ER model into a smaller oriented. They
number of tables) to overcome the limitations of databases to cope with join represent business
intensive BI queries. Instead, the best dimensional models are the result of asking processes
questions to discover which business processes need to be measured, how they described using the
should be described in business terms and how they should be measured. The 7Ws framework
resulting dimensions and fact tables are not arbitrary collections of denormalized
data but the 7Ws that describe the full details of each individual business event
worth measuring.
10 Chapter 1

The Who is involved?


What did they do? To what is it done?
When did it happen?

7Ws
Framework
Where did it take place?
HoW many or much was recorded – how can it be measured?
Why did it happen?
HoW did it happen – in what manner?

The 7Ws are The 7Ws are an extension of the 5 or 6Ws that are often cited as the checklist in
interrogatives: essay writing and investigative journalism for getting the ‘full’ story. Each W is an
question forming interrogative: a word or phrase used to make questions. The 7Ws are especially
words useful for data warehouse data modeling because they focus the design on BI
activity: asking questions.

Fact tables represent verbs (they record business process activity). The facts they
contain and the dimensions that surround them are nouns, each classifiable as
one of the 7Ws. 6Ws: who, what, when, where, why, and how represent dimension
types. The 7th W: how many, represents facts. BEAM✲ data stories use the 7Ws
to discover these important verb and noun combinations.

Star schemas Detailed dimensional models usually contain more than 6 dimensions because any
usually contain of the 6Ws can appear multiple times. For example, an order fulfillment process
8-20 dimensions could be modeled with 3 who dimensions: CUSTOMER, EMPLOYEE, and
CARRIER, and 2 when dimensions: ORDER DATE and DELIVERY DATE.
Having said that, most dimensional models do not have many more than 10 or 12
dimensions. Even the most complex business events rarely have 20 dimensions.

Star schemas The deep benefit of process-oriented dimensional modeling is that it naturally
support agile, breaks data warehouse scope, design and development into manageable chunks
incremental BI consisting of just the individual business processes that need to be measured next.
Modeling each business process as a separate star schema supports incremental
design, development and usage. Agile dimensional modelers and BI stakeholders
can concentrate on one business process at a time to fully understand how it
should be measured. Agile development teams can build and incrementally deliver
individual star schemas earlier than monolithic designs. Agile BI users can gain
early value by analyzing these business processes initially in isolation and then
grow into more valuable, sophisticated cross-process analysis. Why develop ten
stars when one or two can be delivered far sooner with less investment ‘at risk’?

Dimensional modeling provides a well-defined unit of delivery—the star schema


—which supports the agile principles: “Satisfy the customer through early and
continuous delivery of valuable software.” and “Deliver working software fre-
quently, from a couple of weeks to a couple of months, with a preference to the
shorter time scale.”
How to Model a Data Warehouse 11

Data Warehouse Analysis and Design


Both 3NF ER modeling and dimensional modeling are primarily database design Analysis
techniques (one arguably more suited to data warehouse design than the other). techniques are
Prior to using either to design data structures for meeting BI information require- required to
ments, some form of analysis is required to discover these requirements. The two discover BI data
approaches commonly used to obtain data warehousing requirements are data- requirements
driven analysis (also known as supply driven) and reporting-driven analysis (also
known as demand driven). While most modern data warehousing initiatives use
some combination of the two, Figure 1-4 shows the analysis and design bias of
early 3NF enterprise data warehouses compared to that of more recent dimen-
sional data warehouses and data marts.

Figure 1-4
Data warehouse
analysis and design
biases

Data-Driven Analysis
Using a data-driven approach, data requirements are obtained by analyzing oper- Pure data-driven
ational data sources. This form of analysis was adopted by many early IT-lead data analysis avoided
warehousing initiatives to the exclusion of all others. User involvement was early user
avoided as it was mistakenly felt that data warehouse design was simply a matter of involvement
re-modeling multiple data sources using ER techniques to produce a single ‘per-
fect’ 3NF model. Only after that was built, would it then be time to approach the
users for their BI requirements.
12 Chapter 1

Leading to DW Unfortunately, without user input to prioritize data requirements and set a man-
designs that did ageable scope, these early data warehouse designs were time-consuming and
not met BI user expensive to build. Also, being heavily influenced by the OLTP perspective of the
needs source data, they were difficult to query and rarely answered the most pressing
business questions. Pure data-driven analysis and design became known as the
“build it and they will come” or “field of dreams” approach, and eventually died
out to be replaced by hybrid methods that included user requirements analysis,
source data profiling, and dimensional modeling.

Packaged apps are Data-driven analysis has benefited greatly from the use of modern data profiling
especially tools and methods but despite their availability, data-driven analysis has become
challenging data increasing problematic as operational data models have grown in complexity. This
sources to analyze is especially true where the operational systems are packaged applications, such as
Enterprise Resource Planning (ERP) systems built on highly generic data models.

IT staff are In spite of its problems, data-driven analysis continues to be a major source of data
comfortable with requirements for many data warehousing projects because it falls well within the
data-driven analysis technical comfort zone of IT staff who would rather not get too involved with
business stakeholders and BI users.

Reporting-Driven Analysis
Reporting Using a reporting-driven approach, data requirements are obtained by analyzing
requirements are the BI users’ reporting requirements. These requirements are gathered by inter-
gathered by viewing stakeholders one at a time or in small groups. Following rounds of meet-
interviewing ings, analyst’s interview notes and detailed report definitions (typically spreadsheet
potential BI users or word processor mock-ups) are cross-referenced to produce a consolidated list of
in small groups data requirements that are verified against available data sources. The results
requirements documentation is then presented to the stakeholders for ratification.
After they have signed off the requirements, the documentation is eventually used
to drive the data modeling process and subsequent BI development.

User involvement Reporting-driven analysis focuses the data warehouse design on efficiently priori-
helps to create more tizing the stakeholder’s most urgent reporting requirements and can lead to timely,
successful DWs successful deployments when the scope is managed carefully.

Accretive BI Unfortunately, reporting-driven analysis is not without its problems. It is time-


reporting consuming to interview enough people to gather ‘all’ the reporting requirements
requirements are needed to attain an enterprise or even a cross-departmental perspective. Getting
impossible to stakeholders to think beyond ‘the next set of reports’ and describe longer term
capture in full, requirements in sufficient detail takes considerable interviewing skills. Even
in advance experienced business analysts with generous requirement gathering budgets
struggle because detailed analytical requirements by their very nature are accretive:
they gradually build up layer upon layer. BI users find it difficult to articulate
How to Model a Data Warehouse 13

future information needs beyond the ‘next reports’, because these needs are de-
pendent upon the answers the ‘next reports’ will provide, and the unexpected new
business initiatives those answers will trigger. The ensuing steps of collating re-
quirements, feeding them back to business stakeholders, gaining consensus on data
terms, and obtaining sign off can also be an extremely lengthy process.

Over-reliance on reporting requirements has lead to many initially successful data Focusing too closely
warehouse designs that fail to handle change in the longer-term. This typically on current reports
occurs when inexperienced dimensional modelers produce designs that match the alone leads to
current report requests too closely, rather than treating these reports as clues to inflexible
discovering the underlying business processes that should be modeled in greater dimensional models
detail to provide true BI flexibility. The problem is often exasperated by initial
requirement analysis taking so long that there isn’t the budget or willpower to
swiftly iterate and discover the real BI requirements as they evolve. The resulting
inflexible designs have led some industry pundits to unfairly brand dimensional
modeling as too report-centric, suitable at the data mart level for satisfying the
current reporting needs of individual departments, but unsuitable for enterprise
data warehouse design. This is sadly misleading because dimensional modeling has
no such limitation when used correctly to iteratively and incrementally model
atomic-level detailed business processes rather than reverse engineer summary
report requests.

Proactive DW/BI Analysis and Design


Historically, data warehousing has lagged behind OLTP development (in technol- Early DWs were
ogy as well as chronology). Data warehouses were built often long after well estab- reactive to OLTP
lished operational systems were found to be inadequate for reporting purposes, reporting problems
and significant BI backlogs had built up. This reactive approach is illustrated on the
example timeline in Figure 1-5.

Figure 1-5
Reactive DW
timeline

Today, DW/BI has caught up and become proactive. The two different worlds of The lag between
OLTP and DW/BI have become parallel worlds where many new data warehouses OLTP and DW roll-
need to go live/be developed concurrently with their new operational source out is disappearing
systems, as shown on the Figure 1-6 timeline.
14 Chapter 1

Figure 1-6
Proactive DW
timeline

Proactive DW/BI DW/BI has steadily become proactive for a number of business-led reasons:
addresses
operational DW/BI itself has become more operational. The (largely technical) distinction
demands, avoids between operational and analytical reporting has blurred. Increasingly, sophis-
interim solutions ticated operational processes are leveraging the power of (near real-time) BI
and preempts BI and stakeholders want a one-stop shop for all reporting needs: the data ware-
performance house.
problems
Organizations (especially those that already have DW/BI success) now realize
that, sooner rather than later, each major new operational system will need its
own data mart or need to be integrated with an existing data warehouse.

BI stakeholders simply don’t want to support ‘less than perfect’ interim report-
ing solutions and suffer BI backlogs.

Benefits of Proactive Design for Data Warehousing


Proactive DW When data warehouse design preempts detailed operational data modeling it can
design can help BI stakeholders set the data agenda, i.e., stipulate their ideal information
improve the data requirements whilst the new OLTP system is still in development and enhance-
available for BI ments can easily be incorporated. This is especially significant for the definition of
mandatory data. Vital BI attributes that might have been viewed as optional or
insignificant from a purely operational perspective can be specified as not null and
captured from day one—before operational users develop bad habits that might
have them (inadvertently) circumvent the same enhancements made later. Agile
OLTP development teams should welcome these ‘early arriving changes’.

Proactive DW ETL processes are often thought of as difficult/impossible to develop without


design can access to stable data sources. However, when a data source hasn’t been defined or is
streamline ETL still a moving target, it gives the agile ETL team the chance to define its ‘perfect’
change data data extraction interface specification based on the proactive data warehouse
capture model, and pass that on to the OLTP development team. This is a great opportu-
nity for ETL designers to ensure that adequate change data capture functionality
(e.g. consistently maintained timestamps and update reason codes) are built into
all data sources so that ETL processes can easily detect when data has changed and
for what reason: whether genuine change has occurred to previously correct values
(that must be tracked historically) or mistakes have been corrected (which need no
history).
How to Model a Data Warehouse 15

When source database schemas are not yet available, ETL development can still
proceed if ETL and OLTP designers can agree on flat file data extracts. Once
OLTP have committed to provide the specified extracts on a schedule to meet BI
needs, ETL transformation and load routines can be developed to match this
source to the proactive data warehouse design target.

Challenges of Proactive Analysis for Data Warehousing


While being proactive has great potential benefits for DW/BI, the late appearance Proactive analysis
of data on the Figure 1-6 timeline unfortunately heralds further analysis challenges takes place before
for data warehouse designers: BI requirements gathering must take place before data exists
any real data is available. Under these circumstances proactive data modelers can
rely even less upon traditional analysis techniques to provide BI data requirements
to match their aggressive schedule.

Proactive Reporting-Driven Analysis Challenges


Traditional interviewing techniques for gathering reporting requirements are Reporting-driven
problematic when stakeholders haven’t seen the data or applications that will fuel analysis is difficult
their BI imagination. With no existing reports to work from, business analysts before data exists
can’t ask their preferred icebreaker question: “How can your favorite reports be
improved?” and they have nothing to point at if and ask: “How do you use this data
to make decisions?”. Even more open questions such as “What decisions do you
make and what information will help you to make them quicker/better?” can fall
flat when a new operational systems will shortly enable an entirely new business
process that stakeholders have no prior experience of measuring, or managing.

Proactive Data-Driven Analysis Challenges


IT cannot fall back on data-driven analysis: data profiling tools and database Data-driven analysis
remodeling skills are of little use when new source databases don’t exist, are still is impossible with
under development, or contain little or no representative data (only test data). no data to profile
Even when new operational systems are implemented using package applications
with stable, (well) documented database schemas they are often too complicated
for untargeted data profiling: it would take too long and be of little value if only a
small percentage of the database is currently used/populated and well understood
by the available IT resources.

Data then Requirements: a ‘Chicken or the egg’ Conundrum


Before there is data and users have lived with it for a time (with less than perfect BI Proactive DW
access) both IT and business stakeholders cannot define genuine BI requirements design requires a
in sufficient detail. Without these early detailed requirements proactive data new approach to
warehouse designs routinely fail to provide the right information on time to avoid data analysis,
a BI backlog building up as soon as data is available. To solve this ‘data then re- modeling and
quirements’/‘chicken or the egg’ conundrum, proactive data warehousing needs a design
new approach to database analysis and design: not your father’s data modeling, not
even your father’s dimensional modeling!
16 Chapter 1

Agile Data Warehouse Design


Traditional data Traditional data warehousing projects follow some variant of waterfall develop-
warehousing follows ment as summarized on the Figure 1-7 timeline. The shape of this timeline and the
a near-serial or term ‘waterfall’ might suggest that its ‘all downhill’ after enough detailed require-
waterfall approach ments have been gathered to complete the ‘Big Design Up Front’ (BDUF). Unfor-
to design and tunately for DW/BI, this approach relies on a preternatural ability to exhaustively
development capture requirements upfront. It also postpones all data access and the hoped for
BI value it brings until the (bitter) end of the waterfall (or rainbow!). For these
reasons pure waterfall (analyze only once, design only once, develop only once,
etc.) DW/BI development, whether by design or practice, is rare.

Figure 1-7
Waterfall DW
development
timeline

Dimensional Dimensional modeling can help reduce the risks of pure waterfall by allowing
modeling enables developers to release early incremental BI functionality one star schema at a time,
incremental get feedback and make adjustments. But even dimensional modeling, like most
development other forms of data modeling, takes a (near) serial approach to analysis and design
(with ‘Big Requirements Up Front’ (BRUF) preceding BDUF data modeling) that
is subject to the inherent limitations and initial delays described already.

Agile data Agile data warehousing seeks to further reduce the risks associated with upfront
warehousing is analysis and provide even more timely BI value by taking a highly iterative, incre-
highly iterative and mental and collaborative approach to all aspects of DW design and development as
collaborative shown on the Figure 1-8 timeline.

Figure 1-8
Agile DW
development
timeline
How to Model a Data Warehouse 17

By avoiding the BDUF and instead doing ‘Just Enough Design Upfront’ (JEDUF) Agile focuses on the
in the initial iterations and ‘Just-In-Time’ (JIT) detailed design within each itera- early and frequent
tion, agile development concentrates on the early and frequent delivery of working delivery of working
software that adds value, rather than the production of exhaustive requirements software that adds
and design documentation that describes what will be done in the future to add value
value.

For agile DW/BI, the working software that adds value is a combination of query- For DW design, the
able database schemas, ETL processes and BI reports/dashboards. The minimum minimum valuable
set of valuable working software that can be delivered per iteration is a star schema, working software is
the ETL processes that populates it and a BI tool or application configured to a star schema
access it. The minimum amount of design is a star.

To design any type of significant database schema to match the early and frequent Agile database
delivery schedule of an agile timeline requires an equally agile alternative to the development needs
traditionally serial tasks of data requirements analysis and data modeling. agile data modeling

Agile Data Modeling


Scott Ambler, author of several books on agile modeling and agile database tech- Agile data modeling
niques (www.agiledata.org) defines agile data modeling as follows: “Data modeling is collaborative and
is the act of exploring data-oriented structures. Evolutionary data modeling is data evolutionary
modeling performed in an iterative and incremental manner. Agile data modeling is
evolutionary data modeling done in a collaborative manner.”

Iterative, incremental and collaborative all have very specific meanings in an agile Collaborative
development context that bring with them significant benefits: modeling combines
analysis and design
Collaborative data modeling obtains data requirements by modeling directly and actively
with stakeholders. It effectively combines analysis and design and ‘cuts to the involves
chase’ of producing a data model (working software and documentation) stakeholders
rather than ‘the establishing shot’ of recording data requirements (only docu-
mentation).

Incremental data modeling gives you more data requirements when they are Evolutionary
better understood/needed by stakeholders, and when you are ready to imple- modeling supports
ment them. Incremental modeling and development are scheduling strategies incremental
that support early and frequent software delivery. development by
capturing
Iterative data modeling helps you to understand existing data requirements requirements when
better and improve existing database schemas through refactoring: correcting they grow and
mistakes and adding missing attributes which have now become available or change
important. Iterative modeling and development are rework strategies that in-
crease software value.
18 Chapter 1

Agile Dimensional Modeling


DW/BI benefits from By taking advantage of dimensional modeling’s unit of discovery—a business
agile dimensional process worth measuring—agile data modeling has arguably greater benefits for
modeling DW/BI than any other type of database project:

Agile dimensional Agile modeling avoids the ‘analysis paralysis’ caused by trying to discover the
modeling focuses on ‘right’ reports amongst the large (potentially infinite?) number of volatile, con-
business processes stantly re-prioritized requests in the BI backlog. Instead, agile dimensional
rather than reports modeling gets everyone to focus on the far smaller (finite) number of relatively
stable business processes that stakeholders want to measure now or next.

Agile dimensional Agile dimensional modeling avoids the need to decode detailed business events
modeling creates from current summary report definitions. Modeling business processes without
flexible, report- the blinkers of specific report requests produces more flexible, report-neutral,
neutral designs enterprise-wide data warehouse designs.

Agile modeling Agile data modeling can break the “data then requirements” stalemate that
enables proactive exists for DW/BI just before a new operational system is implemented. Proac-
DW/BI to influence tive agile dimensional modeling enables BI stakeholders to define new business
operational system processes from a measurement perspective and provide timely BI input to op-
development erational application development or package configuration.

Evolutionary Agile modeling’s evolutionary approach matches the accretive nature of genu-
modeling supports ine BI requirements. By following hands-on BI prototyping and/or real BI us-
accretive BI age, iterative and incremental dimensional modeling allows stakeholders to
requirements (re)define their real data requirements.

Collaborative Many of the stakeholders involved in collaborative modeling will become direct
modeling teaches users of the finished dimensional data models. Doing some form of dimen-
stakeholders to think sional modeling with these future BI users is an opportunity to teach them to
dimensionally think dimensionally about their data and define common, conformed dimen-
sions and facts from the outset.

Collaborative Collaborative modeling fully engages stakeholders in the design process,


modeling creates making them far more enthusiastic about the resultant data warehouse. It be-
stakeholder pride in comes their data warehouse, they feel invested in the data model and don’t
the data warehouse need to be trained to understand what it means. It contains their consensus on
data terms because it is designed directly by them: groups of relevant business
experts rather than the distillation of many individual report requests inter-
preted by the IT department.

Never underestimate the affection stakeholders will have for data models that they
themselves (help) create.
How to Model a Data Warehouse 19

Agile Dimensional Modeling and Traditional DW/BI Analysis


Agile dimensional modeling doesn’t completely replace traditional DW/BI analysis Agile dimensional
tasks, but by preceding both data-driven and reporting-driven analysis it can make modeling makes
them agile too: significantly reducing the work involved while improving the traditional analysis
quality and value of the results. tasks agile

Agile Data-Driven Analysis


Agile data-driven analysis is streamlined by targeted data profiling. Only the data Data-driven analysis
sources implicated by the agile data model need to be analyzed within each itera- becomes targeted
tion. This targeted profiling supports the agile practice of test-driven development data profiling
(TDD) by identifying the data sources that will be used to test the data warehouse
design and ETL processes ahead of any detailed physical data modeling. If an ETL
test can’t be defined because a source isn’t viable, agile data modelers don’t waste
time physically modeling what can’t be tested, unless they are doing proactive data
warehouse design. In this case the agile data warehouse model can assist the test-
driven development of the new OLTP system.

Agile Reporting-Driven Analysis


Agile reporting-driven analysis takes the form of BI prototyping. The early delivery Reporting-driven
of dimensional database schemas enables the early extraction, transformations and analysis becomes
loading (ETL) of real sample data so that better report requirements can be proto- BI prototyping
typed using the BI user’s actual BI toolset rather than mocked-up with spread-
sheets or word processors. It is intrinsically fairer to ask users to define their
requirements and developers to commit to them, once everyone has a sense of
what their BI tools are capable of, given the available data.

Requirements for Agile Dimensional Modeling


Agile modeling requires both IT and business stakeholders to change their work
practices and adopt new tools and techniques:

Collaborative data modeling requires open-minded people. Data modelers Collaborative


must be prepared to meet regularly with stakeholders (take on a business ana- modelers require
lyst role) while business analysts and stakeholders must be willing to actively techniques that
participate in some data modeling too. Everyone involved needs simple frame- encourage
works, checklists and guidelines that encourage interaction and prompt them interaction
through unfamiliar territory.

Business stakeholders have little appetite for traditional data models, even Collaborative
conceptual models (see Data Model Types, shortly) that are supposedly targeted data modeling
at them. They find the ER diagrams and notation favored by data modelers must use simple,
(and generated by database modeling tools) too complex or too abstract. To inclusive notation
engage stakeholders, agile modelers need to create less abstract, more inclusive and tools
data models using simple tools that are easy to use, and easy to share. These in-
clusive models must easily translate into the more technically detailed,
20 Chapter 1

logical and physical, star schemas used by database administrators (DBAs) and
ETL/BI developers.

Data modeling To encourage collaboration and support iteration, agile data modeling needs to
sessions (model- be quick. If stakeholders are going to participate in multiple modeling sessions
storms) need to be they don’t want each one to take days or weeks. Agile modelers want speed too.
quick: hours rather They don’t want to wear out their welcome with stakeholders. The best results
than days are obtained by modeling with groups of stakeholders who have the experience
and knowledge to define common business terms (conformed dimensions) and
prioritize requirements. It is hard enough to schedule long meetings with these
people individually let alone in groups. Agile data modeling techniques must
support modelstorming: impromptu stand up modeling that is quicker, simpler,
easier and more fun than traditional approaches.

Agile modelers must Stakeholders don’t want to feel that a design is constantly iterating (fixing what
balance JIT and they have already paid for) when they want to be incrementing (adding func-
JEDUF modeling to tionality). They want to see obvious progress and visible results. Agile modelers
reduce design need techniques that support JIT modeling of current data requirement in details
rework and JEDUF modeling of ‘the big picture’ to help anticipate future iterations and
reduce the amount of design rework.

Evolutionary DW Developers need to embrace database change. They are used to working with
development (notionally) stable database designs, by-products of BDUF data modeling. It is
benefits from ETL/BI support staff who are more familiar with coding around the database changes
tools that support needed to match users’ real requirements. To respond efficiently to evolution-
automated testing ary data warehouse design, agile ETL and BI developers need tools that support
database impact analysis and automated testing.

DW designers must Data warehouse designers also need to embrace data model change. They will
embrace change naturally want to limit the amount of disruptive database refactoring required
and allow their by evolutionary design, but they must avoid resorting to generic data model
models to evolve patterns which reduce understandability and query performance, and can al-
ienate stakeholders. Agile data warehouse modelers need dimensional design
patterns that they can trust to represent tomorrow’s BI requirements tomorrow,
while they concentrate on today’s BI requirements now.

Agile dimensional If agile dimensional modeling that is interactive, inclusive, quick, supports JIT and
modeling techniques JEDUF, and enables DW teams to embrace change seems like a tall order don’t worry;
exist for addressing while there are no silver bullets that will make everyone or everything agile over-
these requirements night, there are proven tools and techniques that can address the majority of these
agile modeling prerequisites.
How to Model a Data Warehouse 21

BEAM✲
BEAM✲ is an agile data modeling method for designing dimensional data ware- BEAM✲ is an
houses and data marts. BEAM stands for Business Event Analysis & Modeling. As agile dimensional
the name suggests it combines analysis techniques for gathering business event modeling method
related data requirements and data modeling techniques for database design. The
trailing ✲ (six point open centre asterisk) represents its dimensional deliverables:
star schemas and the dimensional position of each of the 7Ws it uses.

BEAM✲ consists of a set of repeatable, collaborative modeling techniques for BEAM✲ is used to
rapidly discovering business event details and an inclusive modeling notation for discover and
documenting them in a tabular format that is easily understood by business stake- document business
holders and readily translated into logical/physical dimensional models by IT event details
developers.

Data Stories and the 7Ws Framework


BEAM✲ gets BI stakeholders to think beyond their current reporting requirements BEAM✲ modelers
by asking them to describe data stories: narratives that tease out the dimensional and BI stakeholders
details of the business activity they need to measure. To do this BEAM✲ modelers use the 7Ws to tell
ask questions using a simple framework based on the 7Ws. By using the 7Ws (who, data stories
what, where, when, how many, why and how) BEAM✲ conditions everyone in-
volved to think dimensionally. The questions that BEAM✲ modelers ask stake-
holders are the same types of questions that the stakeholders themselves will ask of
the data warehouse when they become BI users. When they do, they will be think-
ing of who, what, when, where, why and how question combinations that measure
their business.

Diagrams and Notation


Example data tables (or BEAM✲ tables) are the primary BEAM✲ modeling tool BEAM✲ tables
and diagram type. BEAM✲ tables are used to capture data stories in tabular form support
and describe data requirements using example data. By doing so they support data modeling
collaborative data modeling by example rather than by abstraction. BEAM✲ tables by example
are typically built up column by column on whiteboards from stakeholders’ re-
sponses to the 7Ws and are then documented permanently using spreadsheets. The
resulting BEAM✲ models look more like tabular reports (see Figure 1-9) rather
than traditional data models.

BEAM✲ (Example Data) Tables


BEAM✲ tables help engage stakeholders who would rather define reports that BEAM✲ tables
answer their specific business questions than do data modeling. While example look like simple
data tables are not reports, they are similar enough for stakeholders to see them as tabular reports
22 Chapter 1

visible signs of progress. Stakeholders can easily imagine sorting and filtering the
low-level detail columns of a business event using the higher-level dimensional
attributes that they subsequently model.

Figure 1-9
Customer Orders
BEAM✲ table

BEAM✲ Short Codes


BEAM✲ uses short BEAM✲ tables are simple enough not to get in the way when modeling with
codes to capture stakeholders, but expressive enough to capture real-world data complexities and
technical data ultimately document the dimensional modeling design patterns used to address
properties them. To do this BEAM✲ models use short (alphanumeric) codes: (mainly) 2 letter
abbreviations of data properties that can be recorded in spreadsheet cells, rather
than graphical notation that would require specialist modeling tools. By adding
short codes, BEAM✲ tables can be used to:

Document dimensional attribute properties including history rules


Document fact properties including aggregation rules
Record data-profiling results and map data sources to requirements
Define physical dimensional models: fact and dimension tables
Generate star schemas

BEAM✲ BEAM✲ short codes act as dimensional modelers’ shorthand for documenting
short codes act generic data properties such as data type and nullability, and specific dimensional
as dimensional properties such as slowly changing dimensions and fact additivity. Short codes can
modeling be used to annotate any BEAM✲ diagram type for technical audiences but can
shorthand easily be hidden or ignored when modeling with stakeholders who are disinter-
ested in the more technical details. Short codes and other BEAM✲ notation con-
ventions will be highlighted in the text in bold. Appendix B provides a reference
list of short codes.

Comparing BEAM✲ and Entity-Relationship Diagrams


We will use Throughout this book we will be illustrating BEAM✲ in action with worked
Pomegranate Corp. examples featuring the fictional Pomegranate Corporation (POM). We begin now
examples to by comparing an ER diagram representation of Pomegranate’s order processing
illustrate BEAM✲ data model (Figure 1-10) with an equivalent BEAM✲ table for the Customer
Orders event (Figure 1-9).
How to Model a Data Warehouse 23

Figure 1-10
Order processing
ER Diagram

By looking at the ERD you can tell that customers may place orders for multiple Example data
products at a time. The BEAM✲ table records the same information, but the models capture
example data also reveals the following: more business
information than
Customers can be individuals, companies, and government bodies. ER models
Products were sold yesterday.
Products have been sold for 10 years.
Products vary considerably in price.
Products can be bundles (made up of 2 products).
Customers can order the same product again on the same day.
Orders are processed in both dollars and pounds.
Orders can be for a single product or bulk quantities.
Discounts are recorded as percentages and money.

Additionally, by scanning the BEAM✲ table you may have already guessed the type Example data
of products that Pomegranate sells and come to some conclusions as to what sort speaks volumes!
of company it is. Example data speaks volumes—wait until you hear what it says
about some of Pomegranate’s (fictional) staff!

Data Model Types


Agile dimensional modelers need to work with different types of models depend- Conceptual, logical
ing on the level of technical detail they are trying to capture or communicate and and physical data
the technical bias of their collaborators and target audience. Conceptual data models provide
models (CDM) contain the least technical detail and are intended for exploring progressively more
data requirements with non-technical stakeholders. Logical data models (LDM) technical detail for
allow modelers to record more technical details without going down to the data- more technical
base specific level, while physical data models (PDM) are used by DBAs to create audiences
database schemas for a specific DBMS. Table 1-2 shows the level of detail for each
model type, its target audience on a DW/BI project, and the BEAM✲ diagram
types that support that level of modeling.
24 Chapter 1

Table 1-2 CONCEPTUAL LOGICAL PHYSICAL


DETAIL
DATA MODEL DATA MODEL DATA MODEL
Data Model Types
Entity Name
Relationship
Attribute Optional
Cardinality Optional
Primary Key
Foreign Key
Data Type Optional
Table Name
Column Name

Data Modelers Data Modelers


Business Analysts Data Modelers DBAs
DW/BI Audience Business Experts ETL Developers DBMS
Stakeholders BI Developers ETL Developers
BI Users BI Developers
Example Data Conceptual
Table Diagrams Enhanced Star
BEAM✲ Diagram Hierarchy Chart with Short Codes Schema
Timeline Enhanced Star Event Matrix
Event Matrix Schema

BEAM✲ and ER Based on the detail levels described in Table 1-2 the order processing ERD in
notation are jointly Figure 1-10 is a logical data model as it shows primary keys, foreign keys and
used to create cardinality, while the BEAM✲ event in Figure 1-9 is a conceptual model (we prefer
collaborative models “business model”) as this information is missing. With additional columns and
for different short codes it could be added to the BEAM✲ table but each diagram type suits its
audiences target audience as is. BEAM✲ tables are more suitable for collaborative modeling
with stakeholders than traditional ERD based conceptual models. While other
BEAM✲ diagram types and short codes compliment and enhance ERDs for col-
laborating with developers on logical/physical star schema design.

BEAM✲ Diagram Types


BEAM✲ also uses Example data tables are not the only BEAM✲ modeling tools. BEAM✲ modelers
event matrices, also uses event matrices, hierarchy charts, timelines and enhanced star schemas to
timelines, hierarchy collaborate on various aspects of the design at different levels of business and
charts and enhanced technical detail. Table 1-3 summarizes the usage of each of the BEAM✲ diagram
star schemas types, and lists their model types, audience and the chapter where they are de-
scribed in detail.

BEAM✲ supports the core agile values: “Individuals and interactions over proc-
esses and tools.”, “Working software over comprehensive documentation.” and
“Customer collaboration over contract negotiation.” BEAM✲ upholds these
values and the agile principle of “maximizing the amount of work not done” by
encouraging DW practitioners to work directly with stakeholders to produce
compilable data models rather than requirements documents, and working BI
prototypes of reports/dashboards rather than mockups.
How to Model a Data Warehouse 25

Table 1-3 BEAM✲ Diagram Types

DATA
PRINCIPAL
DIAGRAM USAGE MODEL AUDIENCE
CHAPTER
TYPE

BEAM✲ (Example Data) Modeling business events and Business Data Modelers 2
Table dimensions one at a time using Logical Business Analysts
example data to document their Physical Business Experts
7Ws details. Stakeholders
BI Users
Example data tables are also used
to describe physical dimension
and fact tables and explain
dimensional design patterns.

Hierarchy Chart Discovering hierarchical relation- Business Data Modelers 3


ships within dimensions and Business Analysts
prompting stakeholders for Business Experts
dimensional attributes. Stakeholders
BI Users
Hierarchy charts are also used to
help define BI drill-down settings
and aggregation levels for report
and OLAP cube definition.

Timeline Exploring time relationships Business Data Modelers 8


between business events. Business Analysts
Business Experts
Timelines are used to discover Stakeholders
when details, process sequences BI Users
and duration facts for measuring
process efficiency.

Event Matrix Documents the relationships Business Data Modelers 4


between all the events and Logical Business Analysts
dimensions within a model. Physical Business Experts
Stakeholders
Event matrices record events in BI Users
value-chain sequences and Data Modelers
promote the definition and reuse ETL Developers
of conformed dimensions across BI Developers
dimensional models. They are
used instead of high-level ERDs
to provide readable overviews of
entire data warehouses or multi-
star schema data marts.

Enhanced Star Schema Visualizing individual dimensional Logical Data Modelers 5


models and generating physical Physical DBAs
database schemas. DBMS
ETL Developers
Enhanced star schemas are BI Developers
standard stars embellished with Testers
BEAM✲ short codes to record
dimensional properties and design
techniques not directly supported
by generic data modeling tools.
26 Chapter 1

Summary
Data warehouses and operational systems are fundamentally different. They have radically
different database requirements and should be modeled using very different techniques.

Dimensional modeling is the appropriate technique for designing high-performance data


warehouses because it produces simpler data models—star schemas—that are optimized for
business process measurement, query performance and understandability.

Star schemas record and describe the measureable events of business processes as fact tables and
dimensions. These are not arbitrary denormalized data structures. Instead they represent the
combination of the 7Ws (who, what, when, where, how many, why and how) that fully describe
the details of each business event. In doing so, fact tables represents verbs, while the facts
(measures) they contain and the dimensions they reference represent nouns.

Dimensional modeling’s process-orientation supports agile development by creating database


designs that can be delivered in star schema/business process increments.

Even with the right database design techniques there are numerous analysis challenges in
gathering detailed data warehousing requirements in a timely manner.

Both data-driven and reporting-driven analysis are problematic, increasingly so, with DW/BI
development becoming more proactive and taking place in parallel with agile operational
application development.

Iterative, incremental and collaborative data modeling techniques are agile alternatives to the
traditional BI data requirements gathering.

BEAM✲ is an agile data modeling method for engaging BI stakeholders in the design of their
own dimensional data warehouses.

BEAM✲ data stories use the 7Ws framework to discover, describe and document business
events dimensionally.

BEAM✲ modelers encourage collaboration by using simple modeling tools such as whiteboards
and spreadsheets to create inclusive data models.

BEAM✲ models use example data tables and alphanumeric short codes rather than ER data
abstractions and graphical notation to improve stakeholder communication. These models are
readily translated into star schemas.

BEAM✲ is an ideal tool for modelstorming a dimensional data warehouse design.


2
MODELING BUSINESS EVENTS
Think like a wise man but communicate in the language of the people.
— William Butler Yeats (1865–1939)

Business events are the individual actions performed by people or organizations Business events are
during the execution of business processes. When customers buy products or use the measureable
services, brokers trade stocks, and suppliers deliver components, they leave behind atomic details of
a trail of business events within the operational databases of the organizations business processes
involved. These business events contain the atomic-level measurable details of the
business processes that DW/BI systems are built to evaluate.

BEAM✲ uses business events as incremental units of data discovery/data model- BEAM✲ modelers
ing. By prompting business stakeholders to tell their event data stories, BEAM✲ discover BI data
modelers rapidly gather the clear and concise BI data requirements they need to requirements by
produce efficient dimensional designs. telling data stories

In this chapter we begin to describe the BEAM✲ collaborative approach to dimen- This chapter is a
sional modeling, and provide a step-by-step guide to discovering a business event step-by-step guide
and documenting its data stories in a BEAM✲ table: a simple tabular format that is to using BEAM✲
easily translated into a star schema. By following each step you will learn how to tables and the 7Ws
use the 7Ws (who, what, when, where, how many, why, and how) to get stake- to describe event
holders thinking dimensionally about their business processes, and describing the details
information that will become the dimensions and facts of their data warehouse—
one that they themselves helped to design!

Data stories and story types: discrete, recurring and evolving Chapter 2 Topics
Discovering business events: asking “Who does what?” At a Glance
Documenting events: using BEAM✲ Tables
Describing event details: using the 7Ws and stories themes
Modelstorming with whiteboards: practical collaborative data modeling

27
3
MODELING BUSINESS DIMENSIONS
I keep six honest serving-men (They taught me all I knew);
Their names are What and Why and When And How and Where and Who.
— Rudyard Kipling, The Elephant’s Child

Business events and their numeric measurements are only part of the agile dimen- Business events
sional modeling story. On their own, BEAM✲ event tables are not sufficient to need dimensions to
design a data warehouse or even a data mart, because they do not contain all the fully describe them
descriptive attributes required for reporting purposes. For complete BI flexibility, for reporting
stakeholders need both the atomic-level event details modeled so far and higher- purposes
level descriptions that allow those details to be analyzed in practical ways. The data
structures that provide these descriptive attributes are dimensions.

In addition to the 7Ws and example data tables, BEAM✲ uses hierarchy charts and BEAM✲ modelers
change stories to discover and define dimensional attributes. Hierarchy charts are draw hierarchy
used to explore the hierarchical relationships between attributes that support BI charts and tell
drill-down analysis, while change stories allow stakeholders to describe their change stories to
business rules for handling slowly changing dimensions. define dimensions

In this chapter we describe how these BEAM✲ tools and techniques are used to This chapter shows
model complete dimension definitions from individual event details. We will use you how to model
the CUSTOMER and PRODUCT event story details from Chapter 2 for our dimensions from
example dimension modelstorming with stakeholders. event story details

Modeling the dimensions of a business event Chapter 3 Topics


Using the 7Ws and BEAM✲ tables to define dimensional attributes At a Glance
Drawing hierarchy charts to model dimensional hierarchies
Telling change stories to describe dimensional history

59
4
abbreviations

MODELING BUSINESS PROCESSES


The only reason for time is so that everything doesn't happen at once
— Albert Einstein

Designing a data warehouse or data mart for business process measurement BI Stakeholders
demands that you quickly move beyond modeling single business events. All but need multiple
the simplest business processes are made up of multiple business events and BI events for process
stakeholders invariably want to do cross-process analysis. When you modelstorm measurement
these multi-event requirements you soon notice two crucial things:

Stakeholders model events chronologically. As you complete one event, Events sequences
stakeholders naturally think of related events that immediately follow or pre- represent business
cede it. These event sequences represent business processes and value chains processes and
that need to be measured end-to-end. value chains

Stakeholders describe different events using many of the same 7Ws. When Events share
you define an event in terms of its 7Ws, stakeholders start thinking of other common
events with the same details, especially events that share its subject or object. dimensions that
These shared details, known to dimensional modelers as conformed dimen- support cross-
sions, are the basis for cross-process analysis. process analysis

In this chapter we describe how an event matrix, the single most powerful BEAM✲ The event matrix is
artifact, is used to storyboard the data warehouse: rapidly model multiple events, an agile tool for
identify significant business processes and conformed dimensions, and prioritize modeling multiple
their development. events

The importance of conformed dimensions for agile DW design Chapter 4 Topics


Modelstorming event sequences with an event matrix At a Glance
Prioritizing event and dimension development using Scrum
Modeling event stories with conformed dimensions and examples
95
5
MODELING STAR SCHEMAS
We are all in the gutter, but some of us are looking at the stars.
— Oscar Wilde

In this chapter we describe the star schema design process for converting This chapter is a
BEAM✲ models into flexible and efficient dimensional data warehouse models. guide to:

The agile approach that we take begins with test-first design, by using data profiling Verifying BEAM✲
techniques to verify the BEAM✲ model against the data available in source sys- models against
tems. This results in an annotated model which documents source data characteris- available data
tics and issues. This is used for model review with stakeholders and development sources
sprint planning with the DW/BI team.

Next, the revised BEAM✲ model is translated into a logical dimensional model by Converting BEAM✲
adding surrogate keys. The resulting facts and dimensions are documented by models into star
drawing enhanced star schemas using a combination of BEAM✲ and ER notation. schemas

Finally, the star schemas are used to generate physical data warehouse schemas Validating DW
which are validated by BI prototyping and documented by creating a physical designs by
dimensional matrix. prototyping

Data profiling to verify stakeholder data requirements Chapter 5 Topics


Annotating BEAM✲ models with data sources and profile metrics At a Glance
Reviewing annotated models and planning development sprints
Converting BEAM✲ models into logical/physical dimensional models
The importance of data warehouse surrogate keys
Designing for slowly changing dimensions
Defining additive facts
Drawing enhanced star schema diagrams and creating physical schemas
BI Prototyping to validate dimensional models
Creating a physical dimensional matrix

129
PART II: DIMENSIONAL DESIGN
PATTERNS
DIMENSIONAL MODELING TECHNIQUES FOR PERFORMANCE, FLEXIBILITY, AND USABILITY

Computers are to design as microwaves are to cooking.


— Milton Glaser

Chapter 6: Who and What: People and Organizations, Products and Services

Chapter 7: When and Where: Time and Location

Chapter 8: How Many: Facts and Measures

Chapter 9: Why and How: Cause and Effect


6
WHO AND WHAT
Dimensional Design Patterns for People and Organizations, Products and Services

Who’s on first?
— Bud Abbott and Lou Costello

What’s next?
— President Jed Bartlet, “The West Wing”

Who and what dimensions such as CUSTOMER, EMPLOYEE and PRODUCT Who and what are
represent some of the most interesting, highly scrutinized, and complex dimen- the most important
sions of a data warehouse. Modeling these dimensions and their inherent hierar- dimensions
chies presents a number of challenges that can be addressed by design patterns.

In the first of our W-themed design pattern chapters we begin by describing mini- This chapter
dimensions and snowflaking for handling large, volatile customer dimensions, describes design
swappable dimensions for mixed customer type models and hierarchy maps for patterns for defining
recursive customer relationships. We then move on to employee dimensions to flexible, high
cover hybrid SCD views for current value/historic value (CV/HV) reporting re- performance who
quirements and multi-valued hierarchy maps for multi-parent HR hierarchies with and what
dotted-line relationships. We finish by looking at product and service dimension dimensions
issues and introduce multi-level dimensions for variable detail facts and reverse
hierarchy maps for component analysis.

Large, rapidly changing customer populations Chapter 6 Design


Mixed business models: businesses and consumers, products and services Challenges
Simultaneous current and historic value reporting requirements At a Glance
Variable-depth hierarchies, recursive relationships
Multi-valued hierarchies
Business processes with variable levels of dimensional detail
Product bill of materials
165
7
location-specific

WHEN AND WHERE


Dimensional Design Patterns for Time and Location

The past is a foreign country: they do things differently there.


— L.P. Hartley, The Go-Between

Every business event happens at a point in time or represents an interval of time. Time is the most
Time is the primary way that BI queries group (“show me monthly totals”), filter frequently used
(“show me sales for Financial Q1”), and compare business events (“How are we dimension for BI
doing year to date, versus last year?”). That is why every fact table has at least one analysis
time (when) dimension.

Most business events occur at a specific geographical or online location. Many Location dimensions
interesting events represent changes of location. Hence, a large number of fact and attributes are
tables have distinct where dimensions in addition to the location attributes that can frequently used too
be found in who and what dimensions, such as customer and product.

Although when and where are separate dimensions, they can influence one an- Time and location
other: Time zones, holidays and seasons, are all examples of location-specific time are separate
attributes that are affected by event geography. Similarly, analytically significant dimensions but can
locations such as the first and last locations in a sequence of events are timing- affect one another
specific location dimensions, affected by event chronology.

In this chapter, we describe dimensional design patterns for efficiently handling This chapter
time and location, in particular, patterns for correctly analyzing year-to-date facts, describes when and
and journeys—facts that represent changes in space and time, that are all about where patterns
where and when.
Efficient date and time reporting Chapter 7 Design
Correct year-to-date analysis Challenges
Time zones, international holidays and seasons At a Glance
National language support
Trip and journey analysis

203
8
HOW MANY
Design Patterns for High Performance Fact Tables and Flexible Measures

How many times must a man look up…


— Bob Dylan, Blowin’ in the Wind

Everything that can be counted does not necessarily count;


everything that counts cannot necessarily be counted.
— Albert Einstein

In this chapter we describe how the three fact table patterns—transaction fact This chapter
tables, periodic snapshots, and accumulating snapshots—are implemented to covers techniques
efficiently measure discrete, recurring and evolving business events. We particu- for incrementally
larly focus on the agile design of accumulating snapshots, by describing how the designing and
requirements for these powerful but complex fact tables can be visually modeled as developing high-
evolving events using event timelines, our final BEAM✲ modelstorming tool. We performance fact
also describe the BEAM✲ notation for capturing fact additivity and fully docu- tables and flexible
menting the limitations of semi-additive facts, such as balances. We conclude with measures
techniques for optimizing fact table performance and multi-fact table reporting by
concentrating on design patterns for aggregates and other derived fact tables that
accelerate and simplify BI queries

Point in time event measurement Chapter 8 Design


Periodic measurement Challenges
Evolving process measurement At a Glance
Modeling evolving event milestones and duration measures
Incremental development of complex fact tables
Flexible fact definition
Fact table performance
Correctly querying multiple fact tables at once
Cross-process analysis using simple BI tools

227
WHY AND HOW
9
Dimensional Design Patterns for Cause and Effect

There is occasions and causes why and wherefore in all things.


— William Shakespeare (1564–1616), "King Henry V", Act 5, scene 1

How am I doing?
— Ed Koch, Mayor of New York 1978–1989

Some of the most valuable dimensions in a data warehouse attempt to explain why Why and how
and how events occur. Why dimensions are used to describe direct and indirect dimensions are
causal factors. They are often closely linked to the how dimensions that provide all closely linked: they
the remaining event descriptions that are not related to the major who, what, when describe cause and
and where dimension types. Together why and how represent cause and effect and effect
complete the 7W dimensional description of a business event.

In our final chapter we cover dimensional design patterns for describing how This chapter
events occur and why facts vary. We focus particularly on bridge table patterns for describes why and
representing multiple causal factors and multi-valued dimensions in general. We how dimension
describe how bridge table weighting factors are used to preserve atomic fact granu- design patterns
larity and avoid ETL time fact allocations. We also describe how bridge tables can
be augmented with multi-level dimensions and pivoted dimensions to efficiently
handle barely multi-valued reporting and complex combination constraints. We
conclude with step, range band and audit dimension techniques for analyzing
sequential events, grouping by facts and handling ETL metadata.

Direct and indirect causal factors Chapter 9 Design


Attributing multiple causes to a fact Challenges
Dealing with barely multi-valued dimensions efficiently At a Glance
Handling complex combination constraints
Understanding sequential behavior
Range band reporting
Tracking data quality and lineage
261
Resources For Agile Dimensional Modelers 295

Websites
decisionone.co.uk : DecisionOne Consulting, Lawrence Corr’s training and consulting firm.

llumino.com : Llumino, Jim Stagnitto’s consulting firm.

modelstorming.com : The companion website to this book where you can download the BEAM✲
Modelstormer spreadsheet, the BI Model Canvas (inspired by the Business Model Canvas) plus other
useful BEAM✲ tools and example models from the book and beyond. It also contains links to our rec-
ommended books, articles, websites, and training courses.

End of Preview

If you would like to read more, you can buy

Agile Data Warehouse Design

using the following direct links:

US: amazon.com

UK: amazon.co.uk

World: bookdepository.com

eBook: odelstorming

You might also like