0% found this document useful (0 votes)
29 views

Data Mining Guidelines

This document outlines the guidelines for a course on data mining. It includes 7 topics that will be covered in the course, such as data preprocessing, association analysis, and cluster analysis. It provides the chapter and page references for each topic. It also lists the course books and additional references. The practical assignments involve applying data mining algorithms and techniques using the Weka tool on various datasets. Students need to prepare reports on their experiments following a prescribed format.

Uploaded by

Manish Sagar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Mining Guidelines

This document outlines the guidelines for a course on data mining. It includes 7 topics that will be covered in the course, such as data preprocessing, association analysis, and cluster analysis. It provides the chapter and page references for each topic. It also lists the course books and additional references. The practical assignments involve applying data mining algorithms and techniques using the Weka tool on various datasets. Students need to prepare reports on their experiments following a prescribed format.

Uploaded by

Manish Sagar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

B.Sc. (Hons.) Computer Science(w.e.f.

2011)

CSHT 616 (IV) Data Mining Guidelines

Sr. No. of
Units Topics Chapter Reference
No. Lectures
1.1 - Why Data Mining? 1.2 - What Is Data Mining?,
1. Introduction 1 1 2
1.4 - What Kinds of Patterns Can Be Mined?
2.1- Data Objects and Attribute Types, 2.2 - basic
2. Data 2 1 2
Statistical Descriptions of Data (upto page 51)
Preprocessing
3. 2.2 – Data Quality, 2.3 – Data Preprocessing 2 2 6
Association 6.1 – Basic Concepts,6.2- Frequent Itemset Mining
4. 6 1 12
Analysis Methods (upto page 259)
4.1 – Preliminaries, 4.2 – General Approach to Solving
a Classification Problem, 4.3 Decision Tree Induction
5. 4 2 7
(Till Pg. 165), 4.5 – Evaluating the Performance of a
Classifier
Classification
5.1 – Rule Based Classifier (upto page 212),5.2 –
Analysis
Nearest Neighbor Classifiers, 5.3– Bayesian Classifiers
6. (Complete for discrete data and only introduction of 5 2 8
Bayes classifier for continuous attributes) till pg. 233,
5.7.1 – Alternative Metrics
10.1- Cluster Analysis, 10.2 - Partition Methods, 10.3 -
Cluster
7. Hierarchical Methods (uptopg 462), 10.4 - Density 10 1 11
Analysis
Based Methods (uptopg 473)

Course Books:

1. Data Mining: Concepts and Techniques, 3nd edition,Jiawei Han and MichelineKamber
2. Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson
Education.

References:

3. Data Mining: A Tutorial Based Primer, Richard Roiger, Michael Geatz, Pearson Education 2003.
4. Introduction to Data Mining with Case Studies, G.K. Gupta, PHI 2006
5. Insight into Data mining: Theory and Practice, Soman K. P., DiwakarShyam, Ajay V., PHI 2006
Practical List: Practical are to be done using Weka, and a report prepared as per the format*. The
operations are to be performed on built-in dummy data sets of weka and/or the downloadable datasets
mentioned in references below. Also wherever applicable, the parameter values are to be varied (upto
3 distinct values). The 'Visualize' tab is to be explored with each operation.

1. Preprocessing : Apply the following filters

weka>filter>supervised>attributed>

 AddClassification
 AttributeSelection
 Discretize
 NominalToBinary

weka>filter>supervised>instance

 StratifiedRemoveFolds
 Resample

weka>filter>unsupervised>attribute>

 Add
 AddExpression
 AddNoise
 Center
 Discretize
 MathExpression
 MergeTwoValues
 NominalToBinary
 NominalToString
 Normalize
 NumericToBinary
 NumericToNominal
 NumericTransform
 PrincipalComponent
 RandomSubset
 Remove
 RemoveType
 ReplaceMissingValues
 Standardize

weka>filter>unsupervised>instance>

 Normalize
 Randomize
 Standardize
 RemoveFrequentValues
 RemoveWithValues
 Resample
 SubsetByExpression

2. Explore the 'select attribute' as follows

weka>attributeSelection>

 FilteredSubsetEval
 WrapperSubsetEval

3. Association mining

weka>associations>

 Apriori
 FPGrowth

4. Classification**

weka>classifiers>bayes>

 NaïveBayes

weka>classifiers>lazy>

 IB1
 IBk

weka>classifiers>trees

 SimpleCart
 RandomTree
 ID3

5. Clustering**

weka>clusters>

 SimpleKMeans
 FarthestFirst algorithm
 DBSCAN
 hierarchicalClusterer
*Prescribed format:

Dataset Task Algorithm Filter Parameters Observations Inference Remarks

For all attributes,


The dataset has
Unsupervised -> mean =0 and
Iris Preprocessing ignoreClass: false been
Standardize standard deviation
standardized
=1

The minimum and


The dataset has
Unsupervised -> scale: 1.0, maximum of all
Iris Preprocessing been normalized
Normalize translation: 0.0 attributes is 0 and
in [0,1]
1 respectively

The minimum and


The dataset has
Unsupervised -> scale: 2.0, maximum of all
Iris Preprocessing been normalized
Normalize translation: 0.0 attributes is 0 and
in [0,2] range
2 respectively

** Proper graphs are to be drawn to compare the accuracies achieved by the variations mentioned below.

 Applying different algorithms to the same dataset.


 10%, 20%, 30%, 40% and 50% Noise.
 Applying different datasets to the same algorithm.

References for the data sets to be used for the experiments:

1. http://archive.ics.uci.edu/ml/
2. http://www.kdnuggets.com/datasets/index.html
3. https://wiki.csc.calpoly.edu/datasets/wiki/apriori (for Association Mining)

You might also like