0% found this document useful (0 votes)
14 views

Unit – I (1)

The document provides an overview of data warehousing, business intelligence, and dimensional modeling concepts, emphasizing the importance of effective decision-making through accessible and reliable information. It outlines the goals and requirements of DW/BI systems, the roles of fact and dimension tables in dimensional modeling, and the architecture components of Kimball's DW/BI framework. Additionally, it discusses the ETL process and the significance of presenting data in a structured manner for analytical purposes.

Uploaded by

mailtoyashi04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Unit – I (1)

The document provides an overview of data warehousing, business intelligence, and dimensional modeling concepts, emphasizing the importance of effective decision-making through accessible and reliable information. It outlines the goals and requirements of DW/BI systems, the roles of fact and dimension tables in dimensional modeling, and the architecture components of Kimball's DW/BI framework. Additionally, it discusses the ETL process and the significance of presenting data in a structured manner for analytical purposes.

Uploaded by

mailtoyashi04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Unit – I

Data Warehousing, Business


Intelligence and Dimensional
Modeling
Concepts

• Business driven goals of data warehousing and


business intelligence
• Metaphor for DW/BI systems
• Dimensional modeling core concepts – Fact
table and Dimension table
• Kimball DW/BI architecture components
• Alternative DW/BI architectures
• Myths about dimensional modeling
Data Warehouse
• Repository of information collected form
heterogeneous data sources that should be
integrated, time variant and non volatile.
• This repository of information can be analyzed
to make effective decisions
Business Intelligence
• Right information
• Right Time
• Right Format
• Right Tool
• Effective Decision Making Process
Important Asset of any organization?
Information - Purposes
• Operational record keeping
– Operational system
– Put the data in

• Analytical decision making


– DW/BI system
– Get the data out
Comparison between Operational systems
and DW/BI system
Operational system DW/BI system
Users turn the wheels of the Users watch the wheels of the
organization organization for evaluation
performance
Users take orders, sign up new They count the new orders and
customers, monitor the status of compare them with last week, last
operational activities, log complaints month or last year order
Optimized to process transactions They monitor about operational
quickly processes and optimized for high
performance queries
Deal with one transaction record at a Hundreds of thousands of transactions
time may be searched for analysis
Perform same operational tasks over Analyze business processes for
and over, executing organization’s making effective decisions
business processes
Do not maintain history but have up-to- Historical data is maintained to
date data accurately evaluate organization’s
performance over time
Goals / Requirements of Data Warehousing
and Business Intelligence
• The DW/BI system must make information
easily accessible.
– Information should be simple and query results
should be obtained in faster way
• The DW/BI system must present information
consistently
– Data from heterogeneous data sources must be
cleaned, integrated, quality ensured
Goals / Requirements of DW/BI (contd.)
• The DW/BI system must adapt to change
– User needs, business conditions, data and
technology
• The DW/BI system must present information
in a timely way
– Raw data -> actionable information
• The DW/BI system must be a secure bastion
that protects the information assets
– Protect organization’s confidential information
Goals / Requirements of DW/BI (contd.)
• The DW/BI system must serve as the
authoritative and trustworthy foundation for
improved decision making
– Right data
• The business community must accept the
DW/BI system to deem it successful
– Simple and fast
Responsibilities / Metaphor for DW/BI
Managers
• Comparison with Magazine editor
Magazine Editor DW/BI Manager
Understand the readers Understand the business
users

Ensure the magazine Deliver high-quality, relevant


appeals to the readers and accessible information
and analytics to the
business users
Sustain the publication Sustain the DW/BI
environment
Tutorial / Test 1
• Compare DW/BI Manager responsibility with
any real time process manager’s responsibility
Dimensional Modeling (DM)
• Preferred technique for presenting analytic data
and it addresses two simultaneous requirements
– Deliver data that’s understandable to the business
users
– Deliver fast query performance
• DM is used to make the database simple
• It ensures that users can easily understand the
data and allows software engineer to navigate
and deliver results quickly and efficiently
Dimensional Modeling (Contd.)
• Dimensional Modeling is a data structure
technique optimized for data storage in a Data
warehouse.
• The purpose of dimensional modeling is to
optimize the database for faster retrieval of
data.
• The concept of Dimensional Modeling was
developed by Ralph Kimball and consists of
“fact” and “dimension” tables.
Dimensional Modeling (Contd.)
• A dimensional model in data warehouse is designed to read,
summarize, analyze numeric information like values, balances,
counts, weights, etc. in a data warehouse.
• In contrast, relation models are optimized for addition,
updating and deletion of data in a real-time Online
Transaction System.
• In the relational mode, normalization and ER models reduce
redundancy in data. On the contrary, dimensional model in
data warehouse arranges data in such a way that it is easier to
retrieve information and generate reports.
• Hence, Dimensional models are used in data warehouse
systems and not a good fit for relational systems.
Example
• An executive describes the business as
– We sell products in various markets and measure
our performance over time
• Dimensional Designers
– Emphasis on product, market, time (Dimensions)
– Cube of data with the edges labeled as product,
market and time
– Points inside the cube stores measurements such
as sales volume or profit
Star Schema Versus OLAP Cubes
Star Schema Versus OLAP Cubes
• Dimensional models implemented in RDBMS are referred
to as star schemas because of their star like structure
• Dimensional models implemented in multidimensional
database environment are referred to as online analytical
processing cubes (OLAP)
• DW/BI environment can have either star schema or OLAP
cubes
• Both star schema and OLAP cubes have common logical
design but physical implementation differs
• Atomic information be loaded into a star schema and then
optional OLAP cubes are populated from star schema
OLAP Deployment Considerations
• To deploy data into OLAP cubes
– Star schema is good physical foundation for building an
OLAP cube, more stable, basis for backup and recovery
– OLAP cubes
• have extreme performance advantages over RDBM’s
• more variable across different vendors than RDBMS
• offer more sophisticated security options
• Offer significantly richer analysis capabilities
• Supports slowly changing dimensions
• Supports transactions and periodic snapshot fact tables
• Support complex hierarchy of data
• Impose detailed constraints and drill down hierarchy
Elements of Dimensional Data Model

• Fact
– Facts are the measurements/metrics or facts from your business process. For a Sales
business process, a measurement would be quarterly sales number
• Dimension
– Dimension provides the context surrounding a business process event. In simple
terms, they give who, what, where of a fact. In the Sales business process, for the
fact quarterly sales number, dimensions would be
• Who – Customer Names
• Where – Location
• What – Product Name
– In other words, a dimension is a window to view information in the facts.
• Attributes
– The Attributes are the various characteristics of the dimension in dimensional data
modeling.
– In the Location dimension, the attributes can be
• State
• Country
• Zipcode etc.
– Attributes are used to search, filter, or classify facts. Dimension Tables contain
Attributes
Elements of Dimensional Data Model
(Contd.)
• Fact Table
– A fact table is a primary table in dimension modeling.
– A Fact Table contains
• Measurements/facts
• Foreign key to dimension table
• Dimension Table
– A dimension table contains dimensions of a fact.
– They are joined to fact table via a foreign key.
– Dimension tables are de-normalized tables.
– The Dimension Attributes are the various columns in a dimension table
– Dimensions offers descriptive characteristics of the facts with the help of
their attributes
– No set limit given for number of dimensions
– The dimension can also contain one or more hierarchical relationships
Steps of Dimensional Modelling

• Identify Business Process


• Identify Grain (level of detail)
• Identify Dimensions
• Identify Facts
• Build Star Schema
• The model should describe the Why, How
much, When/Where/Who and What of your
business process
Step 1) Identify the Business Process

• Identifying the actual business process a


datawarehouse should cover. This could be
Marketing, Sales, HR, etc. as per the data
analysis needs of the organization.
Step 2) Identify the Grain
• The Grain describes the level of detail for the business
problem/solution.
• It is the process of identifying the lowest level of information
for any table in your data warehouse.
• If a table contains sales data for every day, then it should be
daily granularity. If a table contains total sales data for each
month, then it has monthly granularity.
• Example of Grain:
– The CEO at an MNC wants to find the sales for specific
products in different locations on a daily basis.
– So, the grain is “product sale information by location by
the day.”
Step 3) Identify the Dimensions

• Dimensions are nouns like date, store, inventory, etc.


• These dimensions are where all the data should be stored.
• For example, the date dimension may contain data like a
year, month and weekday.
• Example of Dimensions:
– The CEO at an MNC wants to find the sales for specific products
in different locations on a daily basis.
– Dimensions: Product, Location and Time
– Attributes: For Product: Product key (Foreign Key), Name, Type,
Specifications
– Hierarchies: For Location: Country, State, City, Street Address,
Name
Step 4) Identify the Fact

• This step is co-associated with the business


users of the system because this is where they
get access to data stored in the data warehouse.
• Most of the fact table rows are numerical values
like price or cost per unit, etc.
• Example of Facts:
– The CEO at an MNC wants to find the sales for
specific products in different locations on a daily
basis.
– The fact here is Sum of Sales by product by location
by time.
Step 5) Build Schema

• In this step, you implement the Dimension Model.


• A schema is nothing but the database structure (arrangement of
tables). There are two popular schemas
• Star Schema
– The star schema architecture is easy to design. It is called a star
schema because diagram resembles a star, with points radiating
from a center. The center of the star consists of the fact table,
and the points of the star is dimension tables.
– The fact tables in a star schema which is third normal form
whereas dimensional tables are de-normalized.
• Snowflake Schema
– The snowflake schema is an extension of the star schema. In a
snowflake schema, each dimension are normalized and
connected to more dimension tables.
Example – ER Model
Star Schema
Snowflake Schema
Tutorial / Test 2
• Identify Facts, Dimensions and attributes of
any real time application
Dimensional Modeling
• A dimensional model is a data structure technique
optimized for Data warehousing tools.
• Facts are the measurements/metrics or facts from
your business process.
• Dimension provides the context surrounding a
business process event.
• Attributes are the various characteristics of the
dimension modeling.
• A fact table is a primary table in a dimensional model.
• A dimension table contains dimensions of a fact.
Fact Table Grains - Categories
• Transaction snapshot
• Periodic snapshot
• Accumulating snapshot
Fact Table for Measurements
• The numeric measures in a fact table fall into three categories
1. Additive
– The most flexible and useful facts are fully additive;
– additive measures can be summed across any of the dimensions
associated with the fact table.
2. Semi-additive
– measures can be summed across some dimensions, but not all;
– balance amounts are common semi-additive facts because they are

additive across all dimensions except time.
the balance of a bank account, the inventory level of a product, or the headcount of an organization

3. Finally, some measures are completely non-additive, such as ratios.


– A good approach for non-additive facts is, where possible, to store
the fully additive components of the non-additive measure and sum
these components into the final answer set before calculating the
final non-additive fact.
– This final calculation is often done in the BI layer or OLAP cube.
Additive Facts - Examples
• Sales_Amount is the fact. In this case,
Sales_Amount is an additive fact, because you
can sum up this fact along any of the three
dimensions present in the fact table -- date,
store, and product. For example, the sum of
Sales_Amount for all 7 days in a week
represents the total sales amount for that week.
Semi-additive facts - Examples
• The balance of a bank account, the inventory
level of a product, or the headcount of an
organization are semi-additive facts. You can
aggregate them by account, by product, or by
department, but not by date.
Business Process Measurement events
translate into fact tables
Fact Table Characteristics
• All fact tables have two or more foreign keys that
connect to the dimension table’s primary keys
• When all the keys in the fact table correctly match their
respective primary keys in the corresponding
dimension tables, the tables satisfy referential integrity
• The fact table generally has its own primary key
composed of a subset of the foreign keys.
• This key is often called a composite key
• Every table that has a composite key is a fact table.
• Fact tables express many-to-many relationships.
Dimension Tables
• Dimension tables are integral components to a fact table.
• It contains descriptive characteristics of business process
nouns.
• The dimension tables contain the textual context associated
with a business process measurement event.
• They describe the ‘who’,’what’,’where’,’when’,’how’,’why’
associated with the event.
• Each dimension is defined by a single primary key which
serves as the basis for referential integrity with any given fact
table to which it is joined.
• Dimensions provide the entry points to the data and the final
labels and groupings on all DW/BI analyses
Dimension tables
Sample rows from a dimension table with
de-normalized hierarchies
Facts and Dimensions Joined in a
Star Schema
Dimensional attributes and facts form a
simple report
SQL to create the report
• SELECT statement identifies the dimension
attributes in the report followed by aggregated
metric from the fact table.
• FROM clause identifies all the tables involved in the
query
• WHERE clause declares the report’s filter and
declare the joins between the dimension and fact
tables
• GROUP BY clause establishes the aggregation within
the report
Tutorial / Test 3
• Draw fact table, dimension table and create
star schema for any real time application.
Core elements of Kimball DW/BI
architecture
Operational Source Systems
• Heterogeneous historical data
• Special purpose applications
• ERP system
• Operational master data management system
ETL
• Consists of work area, instantiated data
structures, set of processes
• B/w operational source systems and DW/BI
presentation area.
ETL process
• Extraction
– Getting data into DW
– Reading and understanding the source data and copying the data needed into
ETL system for further manipulation
• Transformation
– Cleaning the data
– Combining data from multiple sources
– Removing redundancy
• Load
– Physical structuring and loading of data into presentation area’s target
dimensional models.
– Various subsystems
– Responsibilities
• Dimension table processing
• Surrogate key assignment
• Code lookups
• Splitting or combining columns
Presentation Area
• Place where he data is organized, stored and made
available for direct querying by users, report writers or
other BI applications
• The data be presented, stored and accessed in
dimensional schemas either star schema or OLAP
cubes
• It must contain detailed atomic data
• Also contain aggregated data
• It should be structured around business process
measurement events or business process centric
• Adhere to enterprise data warehouse bus architecture
BI Applications
• Query the data in DW/BI presentation area
• Analytic decision making process
• Simple or complex
Restaurant metaphor for Kimball
Architecture
• Similarity between a restaurant and DW/BI
environment
• ETL System
– Kitchen of a restaurant
• Layout
• High Throughput
• Special ingredients
• Quality
• Consistency
• High integrity
– Skilled professionals or chefs
• ETL process
– Source data is transformed into meaningful presentable information
– Ensure throughput
– Consistent product
– High integrity
– Skilled professionals
• Front room
– BI applications
– Done by business users
• Back room
– ETL system
– Done by ETL staff
Data Presentation and BI in Front Dining
Room
• Restaurant score
– Food ( Quality, taste and presentation)
– Décor (appealing, comfortable surroundings)
– Service (prompt food delivery, support staff, food
received as ordered)
– Cost
Alternative DW/BI architectures
• Independent data mart architecture

• Hub-and-Spoke Corporate Information Factory


Inmon architecture

• Hybrid Hub-and-Spoke and Kimball


Architecture
Independent data mart architecture

• Analytic data is deployed on a departmental basis without concern


to sharing and integrating information across the enterprise.
• A single department identifies requirements for data from an
operational source system.
• Called as standalone analytics
• Advantages
– Easy to understand
– Highly responsive to queries
– Worked well for confirmed dimensions
• Disadvantages
– No consistency
– Multiple uncoordinated extracts
– Redundant
– Insufficient
– Incompatible
Independent data mart architecture
Hub-and-Spoke Corporate Information Factory Inmon
architecture

• Hub-and-Spoke Corporate Information Factory


(CIF) created by Bill Inmon
• Data is extracted from operational source
systems and processed through an ETL system
called data acquisition
• The atomic data that results from this
processing is normalized and it is referred as
Enterprise Data Warehouse (EDW) within CIF
architecture.
• Enterprise data coordination and integration
Hub-and-Spoke Corporate Information
Factory Inmon architecture
Hybrid Hub-and-Spoke and Kimball Architecture

• Integration of Kimball and Inmon CIF


architectures
• Populates CIF centric EDW in which the data is
dimensional, atomic, process centric and
conforms to EDW bus architecture
• Combines the advantages of both methods
Hybrid Hub-and-Spoke and Kimball Architecture
Dimensional Modeling myths
• Dimensional models are only for summary
data
• Dimensional models are departmental not
enterprise
• Dimensional models are not scalable
• Dimensional models are only for predictable
usage
• Dimensional models can’t be integrated
Agile Considerations
• Focus on manageably sized increments of work
• Agile methodologies align with Kimball best
practices are
– Focus on delivering business value
– Value collaboration between development team and
business stakeholders
– Stress ongoing face-to-face communication, feedback
and prioritization with the business stakeholders
– Adapt quickly to inevitable evolving requirements
– Tackle development in an iterative incremental
manner

You might also like