Assignment 1 5
Assignment 1 5
(f) Predicting the future stock price of a company using historical records.
2. Suppose that you are employed as a data mining consultant for an Internet search engine
company. Describe how data mining can help the company by giving specific examples of how
techniques, such as clustering, classification, association rule mining, and anomaly detection can be
applied.
ASSIGNMENT 2 CO2
To be submitted on or before 26/11/2024
1. Often, the aggregate measure value of many cells in a large data cuboid is zero, resulting in a huge,
yet sparse, multidimensional matrix.
(a) Design an implementation method that can elegantly overcome this sparse matrix problem. Note
that you need to explain your data structures in detail and discuss the space needed, as well as how to
retrieve data from your structures.
(b) Modify your design in (a) to handle incremental data updates. Give the reasoning behind your new
design.
2. Suppose a data cube has D dimensions, and the base cuboid contains k distinct
tuples.
(a) Present a formula to calculate the minimum number of cells that the cube, C, may contain.
(b) Present a formula to calculate the maximum number of cells that C may contain.
(c) Answer parts (a) and (b) above as if the count in each cube cell must be no less than a threshold, v.
(d) Answer parts (a) and (b) above as if only closed cells are considered (with the minimum count
threshold,
ASSIGNMENT 3 CO 3 on or before 12/11/24
Write Apriori Algorithm for discovering frequent itemsets for mining Boolean Association rules and
explain the algorithm in detail.
A database has four transactions. Let min sup = 60% and min conf = 80%.
(a) At the granularity of item category (e.g., itemi could be “Milk"), for the following rule template,
ⱯX € transaction; buys(X; item1) ^ buys(X; item2) ═> buys(X; item3) [s, c],
list the frequent k-itemset for the largest k, and all of the strong association rules (with their support
s and confidence c) containing the frequent k-itemset for the largest k.
(b) At the granularity of brand-item category (e.g., itemi could be “Sunset-Milk"), for the following
rule template,
ⱯX € customer; buys(X; item1) ^ buys(X; item2) ═> buys(X; item3)
list the frequent k-itemset for the largest k (but do not print any rules).