0% found this document useful (0 votes)
16 views

Assignment 1 5

Uploaded by

koshmitha.28
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Assignment 1 5

Uploaded by

koshmitha.28
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ASSIGNMENT 1 CO1

To be submitted on or before 22/11/2024


1. Discuss whether or not each of the following activities is a data mining task.
(a) Dividing the customers of a company according to their gender.

(b) Dividing the customers of a company according to their profitability.

(c) Computing the total sales of a company.

(d) Sorting a student database based on student identification numbers.

(e) Predicting the outcomes of tossing a (fair) pair of dice.

(f) Predicting the future stock price of a company using historical records.

(g) Monitoring the heart rate of a patient for abnormalities.

(h) Monitoring seismic waves for earthquake activities.

(i) Extracting the frequencies of a sound wave.

2. Suppose that you are employed as a data mining consultant for an Internet search engine
company. Describe how data mining can help the company by giving specific examples of how
techniques, such as clustering, classification, association rule mining, and anomaly detection can be
applied.

ASSIGNMENT 2 CO2
To be submitted on or before 26/11/2024
1. Often, the aggregate measure value of many cells in a large data cuboid is zero, resulting in a huge,
yet sparse, multidimensional matrix.
(a) Design an implementation method that can elegantly overcome this sparse matrix problem. Note
that you need to explain your data structures in detail and discuss the space needed, as well as how to
retrieve data from your structures.
(b) Modify your design in (a) to handle incremental data updates. Give the reasoning behind your new
design.

2. Suppose a data cube has D dimensions, and the base cuboid contains k distinct
tuples.
(a) Present a formula to calculate the minimum number of cells that the cube, C, may contain.
(b) Present a formula to calculate the maximum number of cells that C may contain.
(c) Answer parts (a) and (b) above as if the count in each cube cell must be no less than a threshold, v.
(d) Answer parts (a) and (b) above as if only closed cells are considered (with the minimum count
threshold,
ASSIGNMENT 3 CO 3 on or before 12/11/24
Write Apriori Algorithm for discovering frequent itemsets for mining Boolean Association rules and
explain the algorithm in detail.
A database has four transactions. Let min sup = 60% and min conf = 80%.

cust TID items bought (in the form of brand-item category)


ID
01 T100 {King's-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread}
02 T200 {Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread}
01 T300 {Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie}
03 T400 {Wonder-Bread, Sunset-Milk, Dairyland-Cheese}

(a) At the granularity of item category (e.g., itemi could be “Milk"), for the following rule template,
ⱯX € transaction; buys(X; item1) ^ buys(X; item2) ═> buys(X; item3) [s, c],
list the frequent k-itemset for the largest k, and all of the strong association rules (with their support
s and confidence c) containing the frequent k-itemset for the largest k.
(b) At the granularity of brand-item category (e.g., itemi could be “Sunset-Milk"), for the following
rule template,
ⱯX € customer; buys(X; item1) ^ buys(X; item2) ═> buys(X; item3)
list the frequent k-itemset for the largest k (but do not print any rules).

ASSIGNMENT 4 CO 4 on or before 12/11/24


1. Why is tree pruning useful in decision tree induction? What is a drawback o fusing a separate
set of tuples to evaluate pruning?
2. Write an algorithm for K nearest neighbor classification given k, the nearest number of
neighbors and n the number of attributes describing each tuple.

ASSIGNMENT 5 CO5 on or before 12/11/24


1. Briefy describe the following approaches to clustering: partitioning methods, hierarchical
methods, density-based methods, grid-based methods, model-based methods, methods for
high-dimensional data, and constraint-based methods. Give examples in each case.

You might also like