0% found this document useful (0 votes)
54 views

CS 431 Quiz 7 Solution

This document contains the solutions to questions on a CS 431 quiz. It addresses topics like: - Data normalization and how it ensures attribute values vary within a narrow range, preventing domination and allowing statistical approaches. - Computing the z-score normal of 25 for a distribution with mean 13 and standard deviation 4. - How wavelet transforms are used for data reduction by compactly representing data with fewer coefficients to approximate the original data, allowing discarding of coefficients without much loss of accuracy but significant decrease in size. - Why the itemset {A, B, C, D} being frequent does not necessarily follow from {B, D} being frequent, as per the Apriori principle.

Uploaded by

Ala mia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

CS 431 Quiz 7 Solution

This document contains the solutions to questions on a CS 431 quiz. It addresses topics like: - Data normalization and how it ensures attribute values vary within a narrow range, preventing domination and allowing statistical approaches. - Computing the z-score normal of 25 for a distribution with mean 13 and standard deviation 4. - How wavelet transforms are used for data reduction by compactly representing data with fewer coefficients to approximate the original data, allowing discarding of coefficients without much loss of accuracy but significant decrease in size. - Why the itemset {A, B, C, D} being frequent does not necessarily follow from {B, D} being frequent, as per the Apriori principle.

Uploaded by

Ala mia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

CS 431 Quiz 7 Solution

1. Data normalization:
a. (2 points) Give an advantage of data normalization for data mining.
Data normalization ensures that values of all attributes or predicates vary within a narrow
range. This prevents one attribute (that has a large values’ spread) from dominating
another when mining tasks such as distance-based clustering is applied. Many statistical
approaches require that data is normalized before application.

b. (2 points) Compute the z-score normal for value 25 belonging to a distribution


with mean 13 and standard deviation 4.
z-score of 25 is (25-13)/4 = 12/4 = 3

2. (2 points) How are wavelets (or, how is wavelet transforms) used for data reduction?
Briefly describe the conceptual procedure.
Wavelets are used to transform data to a different domain that is represented by a
weighted sum of basis functions. The basis functions are called wavelet functions. These
functions are such that the wavelet representation of data is compact, that is, fewer data
points (the coefficients of the functions) are needed to reproduce a good approximation of
the original data. Thus, when data is wavelet transformed some of the coefficients can be
discarded without appreciable loss in accuracy but significant decrease in size.

3. (2 points) If itemset {B, D} is frequent, will itemset {A, B, C, D} be frequent as well?


Explain.
No, it is not necessary that itemset {A, B, C, D} be frequent given itemset {B, D} is
frequent. From the Apriori principle, if an itemset is frequent then all its subsets are
frequent as well. The converse, however, is not true in general. Moreover, the support of
supersets of a frequent itemset would be less than or equal to that of the frequent itemset.

4. (2 points) Suppose you are the new IT manager at XYZ Ltd, an aggressive web
advertising company. The company has been collecting web statistics but has not
been able to make use of them because of their large volume. Give a specific example
in which you might evaluate the performance of a given web ad using data mining.
There are several ways in which web log data can be used to understand ad effectiveness
and web surfer behavior. One example is to mine the log data to find association rules
among surfer clicking/viewing behavior (like, view page X AND views page Y -> clicks
ad Z [s = 2%; c = 60%]). Another example is finding groups (clusters) of web pages from
which a particular group of ads are viewed. This knowledge can help the manager to
better design and place ads for improved response.

You might also like