Assignment 4
Assignment 4
HOMEWORK -4
Submitted by
Dwarakanath Gutta
1002126239
Instructor
Dr. A. C. Sahoo
Problems
1
1. You should identify dimensions, map dimensions to data sources, and specify dimension hierarchies.
For each dimension, you should identify its data sources and attributes in each data source. For
hierarchical dimensions, you should indicate the levels from broad to narrow.
1. The Supplier table serves as the data source and the SuppNo is the primary key for the Supplier dimension.
2. The Supplier table serves as the data source, and the characteristics are SuppName, SuppPhone, SuppEmail, and
SuppDisc.
3. Taking into account the hierarchical dimensions, the country code, area code, and phone number in this
SuppPhone hierarchy.
4. The top level domain, second level domain, and local portion make up SuppEmail's hierarchy.
ProdNo serves as the main key and the Product table serves as the data source for the first Product dimension.
3. The Product table serves as the data source for ProdName and ProdQOH.
1. The Date attribute from the spreadsheet, PurchDate, is mentioned for the Time dimension. It is hierarchical.
2. You should specify measures, related data sources, and measure aggregation properties.
Sol: The following tabular form specifies measures, related data sources and measure aggregation
properties:
2
UnitPrice Spreadsheet Snapshot measure
Amount Spreadsheet derived additive measure
Qty Spreadsheet Additive measure
3. Identify the grain in your dimensional design using the business needs as a guideline. You should then
indicate relative storage requirements for the grain using the statistics for the data sources. Using the
cardinality estimates provided, you should determine either the fact table size or sparsity and then
compute the unknown grain size variable. For example, you should compute sparsity if the fact table size
is given.
• 1100 is the total of the product rows and the unique product rows.
• 60 is the total of the supplier rows and the unique supplier rows.
• To get the fact table size, we must add up the rows from the spreadsheet and the Purchaseline table.
500,000 + 1000 * 12 = 500,000 + 12000 = 512,000, then.
• The estimate of sparsity is equal to 1 - (fact table size/product of dimensions) (1- (512000/
(1100*60*365)) = 0.9787 (based on the previously provided data).
• With just more than 1% of cells missing, the data cube has essentially missed nothing.
4.Extend your analysis to design a star schema (or variation) to support inventory analysis. For each
table, you should define the table name, primary key, and columns. You do not need to write complete
CREATE TABLE statements.
3
5.Identify summarizability problems in your star schema and indicate preferred resolutions of the
summarizability problems. For incomplete dimension-fact relationships, you should also indicate if
columns in a dimension table allow null values.
• The supply spreadsheet does not include the delivery date.Thus, the DelDate link is an incomplete fact-dimension
relationship in a fact-dimension relationship. Since the delivery date will be tracked in the spreadsheet, we are
unable to alter the current data; however, we are able to alter the next data source.
•When the dates of supply purchase and delivery coincide, that date may be set as the default value.The spreadsheet
does not include the SuppEmail and SuppPhone properties for certain vendors. SuppEmail and Suppphone are
consequently disconnected. Since there isn't a proper default setting, extra data can be used to address this
shortcoming.
6. You should populate your data warehouse tables based on the data in the sample tables and spreadsheet.
You do not need to write SQL INSERT statements or insert the data into your tables. You can just show
table listings in your solution. You should indicate mappings from data sources into tables. For example,
mapping may involve generating new primary key values for a data warehouse table or using a default
value for a missing value.
Sol: Since the data warehouse table was constructed using the source table, it is the sample date of the data
warehouse table. Delivery and Acquisition The delivery date is set to the default value of the acquisition
4
date. This is because the value does not include the source data. A surrogate primary key value has been
created using the data from the table data source.
Sample Data for the Supplier Dimension Table
ProdNo ProdName
P0036566 17 inch Color Monitor
P0036577 19 inch Color Monitor
P1114590 R3000 Color Laser Printer
P1412138 10 Foot Printer Cable
P1445671 8-Outlet Surge Protector
P1556678 CVP Ink Jet Color Printer
P3455443 Color Ink Jet Cartridge
P4200344 36-Bit Color Scanner
P6677900 Black Ink Jet Cartridge
P9995676 Battery Back-up System
5
Id Day Month Year
C10000211 1 2 2013
C10000212 2 2 2013
C10000213 3 2 2013
C10000214 4 2 2013
C10000215 5 2 2013
C10000216 6 2 2013
C10000217 7 2 2013
C10000218 8 2 2013
C10000219 9 2 2013
C10000220 10 2 2013
C10000221 11 2 2013
C10000222 12 2 2013
C10000223 13 2 2013
C10000224 14 2 2013
C10000225 15 2 2013
C10000226 16 2 2013
C10000227 17 2 2013
6
I2224041 P0036577 S2029929 10 $200.00 10 $319.00 0.10
I2224042 P9995676 S5095332 10 $45.00 12 $89.00 0.00
I2224043 P1114590 S3399214 15 $450.00 5 $699.00 0.12
I2224044 P1556678 S3399214 10 $50.00 8 $99.00 0.12
I2224045 P3455443 S3399214 25 $21.95 24 $38.00 0.12
I2224046 P6677900 S3399214 25 $12.50 44 $25.69 0.12
I2224047 P1412138 S4290202 50 $6.50 100 $12.00 0.05
I2224048 P4200344 S4420948 15 $99.00 16 $199.99 0.08
INS- 5337
Data Warehousing and Business Intelligence
Academic Integrity
7
In order for your Assignment/Homework/Project to be accepted you must read the following, sign this
form and attach it to your papers (as the last page of your assignment).
Academic Integrity: Students enrolled in this course are expected to adhere to the UT Arlington Honor Code:
I pledge, on my honor, to uphold UT Arlington’s tradition of academic integrity, a tradition that values
hard work and honest effort in the pursuit of academic excellence.
I promise that I will submit only work that I personally create or contribute to group collaborations, and
I will appropriately reference any work from other sources. I will follow the highest standards of
integrity and uphold the spirit of the Honor Code.