1 Assignment
1 Assignment
Q2. Present an example where data mining is crucial to the success of a business. What data
mining functionalities does this business need (e.g., think of the kinds of patterns that could
be mined)? Can such patterns be generated alternatively by data query processing or simple
statistical analysis?
Q3. Briefly outline how to compute the dissimilarity between objects described by the
following:
(a) Nominal attributes
(b) Asymmetric binary attributes
(c) Numeric attributes
(d) Term-frequency vectors
Q4. For the following vectors, x and y, calculate the indicated similarity or dis-
tance measures.
(a) x = (1, 1, 1, 1), y = (2, 2, 2, 2) cosine, correlation, Euclidean
(b) x = (0, 1, 0, 1), y = (1, 0, 1, 0) cosine, correlation, Euclidean, Jaccard
(c) x = (0, −1, 0, 1), y = (1, 0, −1, 0) cosine, correlation, Euclidean
Q5. In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.
Q6. Given the following data (in increasing order) for the attribute age:
13, 15, 16, 16, 19, 20,
20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate
your steps. Comment on the effect of this technique for the given data.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?
Q7. Using the data for age given in Q6 , answer the following:
(a) Use min-max normalization to transform the value 35 for age onto the range [0.0, 1.0].
(b) Use z-score normalization to transform the value 35 for age, where the standard deviation
of age is 12.94 years.
(c) Use normalization by decimal scaling to transform the value 35 for age.
Q8. What are the value ranges of the following normalization methods?
(a) min-max normalization
(b) z-score normalization
(c) z-score normalization using the mean absolute deviation instead of standard deviation
(d) normalization by decimal scaling