Data Mining Class Test
Data Mining Class Test
An online order wine company requires the designing of a data warehouse to record the quantity and sales
of its wines to its customers. Part of the original database is composed by the following tables:
Note that the tables represent the main entities of the ER schema, thus it is necessary to derive the
significant relationships among them in order to correctly design the data warehouse.
i. Design a conceptual schema (Attribute tree and Fact schema) for sales.
ii. Design a Star Schema and a Snowflake Schema.
SET 02
Suppose following is the set of sales transactions of a super-shop company
TransactionID Itemset
T1 6,7,8,5,4,10
T2 3,8,7,5,4,10
T3 6,1,5,4
T4 6,9,2,5,10
T5 2,8,8,5,4
I. Generate the candidate itemset and frequent itemset with minimum support count 3.
II. Generate Association rules from the frequent itemset you generated.
SET 03
Given the following data (Money possessed by each of the 141 people in an area):
Range of Money 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80
(Thousands)
No. of People 5 6 11 21 35 30 22 11
Deduce the Pearson's coefficient of Skewness. Also interpret the value of the coefficient you derived
SET 04
Divide the following binary featured (X1, X2) data instances into two clusters using k-means algorithm
until convergence
X1 1 2 2 3 4 5
X2 1 1 3 2 3 5
SET 05
Given the following data table of a survey among businessmen. “Business experience”, “Competition”,
“Business Type” are feature attributes while “Profit” is the target class attribute.
Business Competition Business Profit
experience Type
Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up
SET 06
Given the income (in thousands taka) of 10 farmers in a village are following:
50, 45, 11, 12, 80, 16, 17, 15, 14, 30
Derive the standard deviation and then normalize the above data using z-score.
Deduce the contingency table and (by calculating Pearson chi-square statistic) find out whether any
correlation prevails between choosing sports type and age of the people.
SET 08
Following are two relational schemas from two data sources:
Source 01:
ProductProduced (ProductProducedCode, Name, Description, Warnings, Notes, CatalogueID)
ProductVersion (ProductProducedCode, ProductVersionCode, Size, Color, Name, Description, Stock,
Price)
Source 02:
Commodity (CommodityCode, Name, Size, Color, CommodityIntro, Type, Price, InventoryQuantity)
Item (ItemCode, Name, Description)
Perform a view-based integration operation (create a global relational schema) using Local as View
(LAV).