Data Warehouse Architecture
Data warehousing Architecture
OLTP ETL Data Warehouse Reporting
Data warehousing Architecture
Cleansing, Transformation & Loading
Extract-
Push/Pull
Source 1
Canned Reports
Source 2
Ad-hoc analysis
Source 3
Summaries
/
Aggregatio
ns
Source n
Sources
Staging Data Reporting
Layer Warehouse Layer
Internal/External ETL DWH Reporting
Sources
Data warehousing Architecture (ODS)
Cleansing, Transformation & Loading
Extract-
Push/Pull
Source 1
Canned Reports
Source 2
Ad-hoc analysis
Source 3 Summaries
Detail Data /
Aggregatio
ns
Source n Transformatio
n
Summarization
Aggregation
Sources Staging ODS Data Reporting
Layer Warehouse Layer
Internal/External ETL ODS DWH Reporting
Sources
Data warehousing Architecture (ODS & Data Marts)
Cleansing, Transformation & Loading
Extract-
Push/Pull
Source 1
Canned Reports
Source 2
Ad-hoc analysis
Source 3 Summaries
Detail Data / Cubes-
Aggregatio Conformed
ns Dimension
s
Source n Transformatio
n
Summarization
Aggregation
Sources Staging ODS Data Data Reporting
Layer Warehouse Marts Layer
Internal/External ETL ODS DWH Data Marts Reporting
Sources
Data Modeling
STEPS in DATA MODELING
Requirement Gathering
Analysis
Logical Database Design
Physical Database design
DATA MODELING Types
Conceptual Data modeling
Describe data requirements from a business point of
view without technical details
Logical Data modeling
Refine conceptual models
Data structure oriented, platform independent
Physical Data modeling
Detailed specification of what is physically
implemented using specific technology
Conceptual Data Model
A conceptual model shows data through business eyes.
All entities which have business meaning.
Important relationships
Few significant attributes in the entities.
Logical Data Model
This is actual implementation and extension of conceptual data
model.
Logical data model includes all required entities, attributes, key
groups and relationships that represent business information and
define business rules.
Physical Data Model
A Physical data model may include
Referential Integrity
Indexes
Views
Alternate keys and other constraints
Table spaces and physical storage objects.
Enterprise Data Model
Enterprise data model sometimes called as Global
business model and the entire information would
be captured in the form of entities.
Enterprise Data Model Example
Entity-Relationship Modeling
Traditional modeling technique
Technique of choice for OLTP
Suited for corporate data warehouse
Limitations of E-R Modeling
Poor Performance
Tend to be very complex and difficult to
navigate.
Dimensional Modeling
Dimensional data modeling comprises of one or more dimension tables and fact tables.
Eg . Dimension table - Location, Product, Time , Organization etc.,
A Dimensional table stores Columns or dimensions that describe the objects in a fact
table. Dimension table contain the textual descriptors of the business. Each dimension is
defined by its single primary key.
End users can easily understand and navigate the data structure.
Dimensional Modeling
Dimensional modeling uses two basic concepts :
facts (measures), dimensions.
Is powerful in representing the requirements of the
business user in the context of database tables.
Focuses on numeric data, such as values counts,
weights, balances and occurrences.
Dimensional modeling
Must identify
Business process to be supported
Grain (level of detail)
Dimensions
Facts
What is Fact?
A fact is a collection of related data items,
consisting of measures and context data.
Each fact typically represents a business item, a
business transaction, or an event that can be used
in analyzing the business or business process.
Facts are measured, “continuously valued”,
rapidly changing information. Can be calculated
and/or derived.
Types of Facts
Additive
Additive facts are facts that can be summed up through all of the
dimensions in the fact table.
Able to add the facts along all the dimensions
Eg. Retail sales in $ (or) A sales fact
Semi-Additive
Semi-Additive facts are facts that can be summed up for some of the dimensions
in the fact table, but not the others.
Eg. Daily balances fact can be summed up through the customers dimension
but not through the time dimension.
Non-Additive
Non-Additive facts cannot be summed up for any of the dimensions present in
the fact table.
Eg. %(Percentages) , Ratios etc.,
Classification of Facts
Based on the classification , there are 2 types of Fact tables.
Cumulative Facts
Snapshot Facts
Cumulative Facts - This type of fact table describes what has happened over period
of time.
Eg. Additive Facts , Total sales by product by store by day or week or month or year .
Snapshot Facts – This type of fact table describes the state of things in a particular
instance of time.
Eg. Semi-Additive & Non-Additive facts .
Factless Fact Table
Some event tables have no obvious numeric facts
(measures) are called Factless fact tables.
Events often are modeled as a fact table containing a series
of keys, each representing a participating dimension in the
event.
Example :- Promotion table
PROMO ID Promotion Start Dt End Dt Description
2213 Credit card 230413 270413 10% cash back
2214 Credit Card 280413 010513 15% cash back
In the above example PROMO ID ‘s Surrogate Keys
and those are not measures.
Dimensions Types
Conformed Dimension
Junk Dimension
Slowly Changing Dimension
Degenerated Dimension
Dimensions Types
Conformed Dimension
A conformed dimension is a dimension, which is standard
across all data marts.
For example :- Enterprise Data Warehouse's data can
segmented into Sales Data Mart, Inventory and Shipping
Data Mart, Finance Data Mart, Geographical Data Mart,
HR and Management Data Mart and so on.
Dimensions Types
Dimensions Types
Junk Dimension
Junk Dimension is used to records a collection of low-
cardinality Flags and Indicators data.
Flag data may be non-generic question's answers like
Yes/No or True/False or Activate/Deactivate.
Indicator data may be tiny text data like Height,
Width,Weight, Color, Status.
Dimensions Types
Figure 1 :
Dimensions Types
Figure 2 :
Dimensions Types
Degenerated Dimension
The term degenerate dimension, refers to a field that will be
used as a criterion of analysis and that is stored in the fact
table.
For example :- If any fields from dimensions can not
perform grouping or summarized by the field in the fact
table.
Item number, Ticket numbers, Transaction number etc., are
examples of degenerated dimensions.
Data marts (DM)
- Data Mart is a subset of Data Warehouse.
It is really similar to a data warehouse but limited in scope and purpose
and is usually aligned with one department, function, application or
business unit.
Several names for DMs:
• Departmental DSS DBs
• OLAP Data bases
• multi-dimensional DBs (MDDB) or Cubes
• lightly summarized tables
Data marts Types
• Dependent data marts are marts that are fed directly by the DW,
sometimes supplemented with other feeds, such as external data.
• Independent data marts are marts that are fed directly by external
sources and do not use the DW.
• Embedded data marts are marts that are stored within the central DW.
They can be stored relationally as files or cubes.
Operational Data Store (ODS)
An ODS
• pulls together, validates, cleanses and integrates data
• foundation for providing integrated view of enterprise data.
• tactical decision support, day-to-day operations and management
reporting.
Characteristics
Integrated
Subject-oriented
Volatile (including update)
Current valued
Types of Schemas
- Star schema
- Snowflake schema
- constellation (or) Integrated (or) Galaxy (or)
Hybrid schema
Star Schema Design
Single fact table surrounded by denormalized dimension
tables
Fact table contains transaction type information.
Many star schemas in a data mart.
Easily understood by end users, more disk storage
required.
Example of Star Schema
Snowflake Schema
Single fact table surrounded by normalized dimension tables.
When dimensions become very large.
Less intuitive, slower performance due to joins.
Example of Snowflake Schema