Data Mining Guidelines
Data Mining Guidelines
2011)
Sr. No. of
Units Topics Chapter Reference
No. Lectures
1.1 - Why Data Mining? 1.2 - What Is Data Mining?,
1. Introduction 1 1 2
1.4 - What Kinds of Patterns Can Be Mined?
2.1- Data Objects and Attribute Types, 2.2 - basic
2. Data 2 1 2
Statistical Descriptions of Data (upto page 51)
Preprocessing
3. 2.2 – Data Quality, 2.3 – Data Preprocessing 2 2 6
Association 6.1 – Basic Concepts,6.2- Frequent Itemset Mining
4. 6 1 12
Analysis Methods (upto page 259)
4.1 – Preliminaries, 4.2 – General Approach to Solving
a Classification Problem, 4.3 Decision Tree Induction
5. 4 2 7
(Till Pg. 165), 4.5 – Evaluating the Performance of a
Classifier
Classification
5.1 – Rule Based Classifier (upto page 212),5.2 –
Analysis
Nearest Neighbor Classifiers, 5.3– Bayesian Classifiers
6. (Complete for discrete data and only introduction of 5 2 8
Bayes classifier for continuous attributes) till pg. 233,
5.7.1 – Alternative Metrics
10.1- Cluster Analysis, 10.2 - Partition Methods, 10.3 -
Cluster
7. Hierarchical Methods (uptopg 462), 10.4 - Density 10 1 11
Analysis
Based Methods (uptopg 473)
Course Books:
1. Data Mining: Concepts and Techniques, 3nd edition,Jiawei Han and MichelineKamber
2. Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson
Education.
References:
3. Data Mining: A Tutorial Based Primer, Richard Roiger, Michael Geatz, Pearson Education 2003.
4. Introduction to Data Mining with Case Studies, G.K. Gupta, PHI 2006
5. Insight into Data mining: Theory and Practice, Soman K. P., DiwakarShyam, Ajay V., PHI 2006
Practical List: Practical are to be done using Weka, and a report prepared as per the format*. The
operations are to be performed on built-in dummy data sets of weka and/or the downloadable datasets
mentioned in references below. Also wherever applicable, the parameter values are to be varied (upto
3 distinct values). The 'Visualize' tab is to be explored with each operation.
weka>filter>supervised>attributed>
AddClassification
AttributeSelection
Discretize
NominalToBinary
weka>filter>supervised>instance
StratifiedRemoveFolds
Resample
weka>filter>unsupervised>attribute>
Add
AddExpression
AddNoise
Center
Discretize
MathExpression
MergeTwoValues
NominalToBinary
NominalToString
Normalize
NumericToBinary
NumericToNominal
NumericTransform
PrincipalComponent
RandomSubset
Remove
RemoveType
ReplaceMissingValues
Standardize
weka>filter>unsupervised>instance>
Normalize
Randomize
Standardize
RemoveFrequentValues
RemoveWithValues
Resample
SubsetByExpression
weka>attributeSelection>
FilteredSubsetEval
WrapperSubsetEval
3. Association mining
weka>associations>
Apriori
FPGrowth
4. Classification**
weka>classifiers>bayes>
NaïveBayes
weka>classifiers>lazy>
IB1
IBk
weka>classifiers>trees
SimpleCart
RandomTree
ID3
5. Clustering**
weka>clusters>
SimpleKMeans
FarthestFirst algorithm
DBSCAN
hierarchicalClusterer
*Prescribed format:
** Proper graphs are to be drawn to compare the accuracies achieved by the variations mentioned below.
1. http://archive.ics.uci.edu/ml/
2. http://www.kdnuggets.com/datasets/index.html
3. https://wiki.csc.calpoly.edu/datasets/wiki/apriori (for Association Mining)