0% found this document useful (0 votes)
16 views

1 Assignment

The document outlines an assignment with multiple questions related to data mining functionalities, including definitions and examples of characterization, discrimination, association, classification, regression, clustering, and outlier analysis. It also discusses the importance of data mining for business success, methods for handling missing values, and techniques for data smoothing and normalization. Additionally, it includes calculations for similarity measures and normalization methods for a given dataset of ages.

Uploaded by

divinexhumane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

1 Assignment

The document outlines an assignment with multiple questions related to data mining functionalities, including definitions and examples of characterization, discrimination, association, classification, regression, clustering, and outlier analysis. It also discusses the importance of data mining for business success, methods for handling missing values, and techniques for data smoothing and normalization. Additionally, it includes calculations for similarity measures and normalization methods for a given dataset of ages.

Uploaded by

divinexhumane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment-1

Q1. Define each of the following data mining functionalities: characterization,


discrimination, association and correlation analysis, classification, regression, clustering, and
outlier analysis. Give examples of each data mining functionality, using a real-life database
that you are familiar with.

Q2. Present an example where data mining is crucial to the success of a business. What data
mining functionalities does this business need (e.g., think of the kinds of patterns that could
be mined)? Can such patterns be generated alternatively by data query processing or simple
statistical analysis?

Q3. Briefly outline how to compute the dissimilarity between objects described by the
following:
(a) Nominal attributes
(b) Asymmetric binary attributes
(c) Numeric attributes
(d) Term-frequency vectors

Q4. For the following vectors, x and y, calculate the indicated similarity or dis-
tance measures.
(a) x = (1, 1, 1, 1), y = (2, 2, 2, 2) cosine, correlation, Euclidean
(b) x = (0, 1, 0, 1), y = (1, 0, 1, 0) cosine, correlation, Euclidean, Jaccard
(c) x = (0, −1, 0, 1), y = (1, 0, −1, 0) cosine, correlation, Euclidean

Q5. In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.

Q6. Given the following data (in increasing order) for the attribute age:
13, 15, 16, 16, 19, 20,
20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) Use smoothing by bin means to smooth the above data, using a bin depth of 3. Illustrate
your steps. Comment on the effect of this technique for the given data.
(b) How might you determine outliers in the data?
(c) What other methods are there for data smoothing?

Q7. Using the data for age given in Q6 , answer the following:
(a) Use min-max normalization to transform the value 35 for age onto the range [0.0, 1.0].
(b) Use z-score normalization to transform the value 35 for age, where the standard deviation
of age is 12.94 years.
(c) Use normalization by decimal scaling to transform the value 35 for age.

Q8. What are the value ranges of the following normalization methods?
(a) min-max normalization
(b) z-score normalization
(c) z-score normalization using the mean absolute deviation instead of standard deviation
(d) normalization by decimal scaling

You might also like