CS 431 Quiz 7 Solution
CS 431 Quiz 7 Solution
1. Data normalization:
a. (2 points) Give an advantage of data normalization for data mining.
Data normalization ensures that values of all attributes or predicates vary within a narrow
range. This prevents one attribute (that has a large values’ spread) from dominating
another when mining tasks such as distance-based clustering is applied. Many statistical
approaches require that data is normalized before application.
2. (2 points) How are wavelets (or, how is wavelet transforms) used for data reduction?
Briefly describe the conceptual procedure.
Wavelets are used to transform data to a different domain that is represented by a
weighted sum of basis functions. The basis functions are called wavelet functions. These
functions are such that the wavelet representation of data is compact, that is, fewer data
points (the coefficients of the functions) are needed to reproduce a good approximation of
the original data. Thus, when data is wavelet transformed some of the coefficients can be
discarded without appreciable loss in accuracy but significant decrease in size.
4. (2 points) Suppose you are the new IT manager at XYZ Ltd, an aggressive web
advertising company. The company has been collecting web statistics but has not
been able to make use of them because of their large volume. Give a specific example
in which you might evaluate the performance of a given web ad using data mining.
There are several ways in which web log data can be used to understand ad effectiveness
and web surfer behavior. One example is to mine the log data to find association rules
among surfer clicking/viewing behavior (like, view page X AND views page Y -> clicks
ad Z [s = 2%; c = 60%]). Another example is finding groups (clusters) of web pages from
which a particular group of ads are viewed. This knowledge can help the manager to
better design and place ads for improved response.