Case - Study of Data Warehouse
Case - Study of Data Warehouse
https://www.youtube.com/watch?v=E_zFM7mzFUg
Basic Architecture
Data Cube with Multidimensional Display
Scenario
X-Mart is having different malls in our city, where daily sales take place for various products.
Higher management is facing an issue while decision making due to non-availability of
integrated data they can’t do study on their data as per their requirement. So, they asked us
to design a system which can help them quickly in decision making and provide Return on
Investment (ROI).
Let us start designing of data warehouse, we need to follow a few steps before we start our
data warehouse design.
The phases of a data warehouse project listed below are similar to those of most database
projects, starting with identifying requirements and ending with executing the T-SQL Script
to create data warehouse:
We need to interview the key decision makers to know, what factors define the success in
the business? How does management want to analyze their data? What are the most
important business questions, which need to be satisfied by this new system?
We also need to work with persons in different departments to know the data and their
common relations if any, document their entire requirement which need to be satisfied by
this system.
Let us first identify the requirement from management about their requirements.
We need to design Dimensional Model to suit requirements of users which must address
business needs and contains information which can be easily accessible. Design of model
should be easily extensible according to future needs. This model design must supports
OLAP cubes to provide "instantaneous" query results for analysts.
Let us take a quick look at a few new terms and then we will identify/derive it for our
requirement.
Dimension
different types of dimensions available like confirmed dimension, Role Playing dimension,
Degenerated dimension, Junk Dimension.
Slowly changing dimension (SCD) specifies the way using which you are storing values of
your dimension which is changing over a time and preserver the history. Different methods /
types are available to store history of this change E.g. SCD1, SCD2, and SCD3 you can use as
per your requirement.
Measure
A measure represents a column that contains quantifiable data, usually numeric, that can be
aggregated. A measure is generally mapped to a column in a fact table. For your
information, various types of measures are there. E.g. Additive, semi additive and Non
additive.
Fact Table
Data in fact table are called measures (or dependent attributes), Fact table provides
statistics for sales broken down by customer, salesperson, product, period and store
dimensions. Fact table usually contains historical transactional entries of your live system, it
is mainly made up of Foreign key column which references to various dimension and
numeric measure values on which aggregation will be performed. Fact tables are of different
types, E.g. Transactional, Cumulative and Snapshot.
Let us identify what attributes should be there in our Fact Sales Table.
Sales Date key, Sales Time key, Invoice Number, Sales Person ID, Store ID, Customer ID
2. Measures
We have done some basic workout to identify dimensions and measures, now we have to
use appropriate schema to relate this dimension and Fact tables.
E.g. Star Schema, Snow Flake Schema, Star Flake Schema, Distributed Star Schema, etc.
In a different article, we will discuss all these schemas, dimension types, measure types,
etc., in detail.
Personally, I will first try to use Star schema due to hierarchical attribute model it provides
for analysis and speedy performance in querying the data.
Star schema the diagram resembles a star, with points radiating from a center. The center of
the star consists of fact table and the points of the star are the dimension tables.
Let us create Our First Star Schema, please refer to the below figure:
Let us execute our T-SQL Script step by step to create table and populate them with
appropriate test values.
Follow the given steps to run the query in SSMS (SQL Server Management Studio).
Incase “sa” login is not made at time of installation than login as window authentication
and make new user “ sa” and assign password and give full rights.
Step 1
Step 3
Step 4
Create Store Dimension table which will hold details related stores available across various
places.
Step 5
Create Dimension Sales Person table which will hold details related stores available across
various places.
Step 6
Create Date Dimension table which will create and populate date data divided on various
levels.
Download the script from the Moodle and run it in this database for creating and filling of
date dimension with values.
Step 7
Create Time Dimension table which will create and populate Time data for the entire day
with various time buckets. Download the script and run it in this database for creating and
filling of time dimension with values.
Step 8
Create Fact table to hold all your transactional entries of previous day sales with
appropriate foreign key columns which refer to primary key column of your dimensions; you
have to take care while populating your fact table to refer to primary key values of
appropriate dimensions.
e.g.
Customer Henry Ford has purchase purchased 2 items (sunflower oil 1 kg, and 2 Nirma soap)
in a single invoice on date 1-jan-2013 from D-mart at Sivranjani and sales person was Jacob ,
billing time recorded is 13:00, so let us define how will we refer to the primary key values
from each dimension.
Before filling fact table, you have to identify and do look up for primary key column values in
dimensions as per given example and fill in foreign key columns of fact table with
appropriate key values.
-- Add relation between fact table foreign keys to Primary keys of Dimensions
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_StoreID FOREIGN KEY (StoreID)REFERENCES DimStores(StoreID);
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_CustomerID FOREIGN KEY (CustomerID)REFERENCES Dimcustomer(CustomerID);
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_ProductKey FOREIGN KEY (ProductID)REFERENCES Dimproduct(ProductKey);
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_SalesPersonID FOREIGN KEY (SalesPersonID)REFERENCES
Dimsalesperson(SalesPersonID);
Go
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_SalesDateKey FOREIGN KEY (SalesDateKey)REFERENCES DimDate(DateKey);
Go
AlTER TABLE FactProductSales ADD CONSTRAINT _
FK_SalesTimeKey FOREIGN KEY (SalesTimeKey)REFERENCES DimDate(TimeKey);
Go
Populate your Fact table with historical transaction values of sales for previous day, with
proper values of dimension key values.
(2,20130101,44519,122159,1,2,3,1,1,42,43.5,1.5),
(2,20130101,44519,122159,1,2,4,1,3,54,60,6),
(3,20130101,52415,143335,1,3,2,2,2,11,13,2),
(3,20130101,52415,143335,1,3,3,2,1,42,43.5,1.5),
(3,20130101,52415,143335,1,3,4,2,3,54,60,6),
(3,20130101,52415,143335,1,3,5,2,1,135,139,4),
--2-jan-2013
--SalesInvoiceNumber,SalesDateKey,SalesTimeKey,SalesTimeAltKey,_
StoreID,CustomerID,ProductID
,SalesPersonID,Quantity,ProductActualCost,SalesTotalCost,Deviation)
(4,20130102,44347,121907,1,1,1,1,2,11,13,2),
(4,20130102,44347,121907,1,1,2,1,1,22.50,24,1.5),
(5,20130102,44519,122159,1,2,3,1,1,42,43.5,1.5),
(5,20130102,44519,122159,1,2,4,1,3,54,60,6),
(6,20130102,52415,143335,1,3,2,2,2,11,13,2),
(6,20130102,52415,143335,1,3,5,2,1,135,139,4),
(7,20130102,44347,121907,2,1,4,3,3,54,60,6),
(7,20130102,44347,121907,2,1,5,3,1,135,139,4),
--3-jan-2013
--SalesInvoiceNumber,SalesDateKey,SalesTimeKey,SalesTimeAltKey,StoreID,_
CustomerID,ProductID ,SalesPersonID,Quantity,ProductActualCost,SalesTotalCost,Deviation)
(8,20130103,59326,162846,1,1,3,1,2,84,87,3),
(8,20130103,59326,162846,1,1,4,1,3,54,60,3),
(9,20130103,59349,162909,1,2,1,1,1,5.5,6.5,1),
(9,20130103,59349,162909,1,2,2,1,1,22.50,24,1.5),
(10,20130103,67390,184310,1,3,1,2,2,11,13,2),
(10,20130103,67390,184310,1,3,4,2,3,54,60,6),
(11,20130103,74877,204757,2,1,2,3,1,5.5,6.5,1),
(11,20130103,74877,204757,2,1,3,3,1,42,43.5,1.5)
Go
After executing the above T-SQL script, your sample data warehouse for sales will be ready,
now you can create OLAP Cube on the basis of this data warehouse.
In order to generate OLAP Cube we need to deign in SQL Server Bussiness Intelligence. But
for the Practice let try to generate queries for two dimensional/Three Dimensional Cube.
Example For Two Dimensional CUBE QUERY.
Exercise
1. Draw the Schema by Using Database Diagram
2. Populate the data and Generate the Different Dimensional CUBE View..
3. And report/show the Analysis Result in Tabular Form.