0% found this document useful (0 votes)

29 views

Data Mining Guidelines

This document outlines the guidelines for a course on data mining. It includes 7 topics that will be covered in the course, such as data preprocessing, association analysis, and cluster analysis. It provides the chapter and page references for each topic. It also lists the course books and additional references. The practical assignments involve applying data mining algorithms and techniques using the Weka tool on various datasets. Students need to prepare reports on their experiments following a prescribed format.

Uploaded by

Manish Sagar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Data Mining Guidelines

Uploaded by

Manish Sagar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

B.Sc. (Hons.) Computer Science(w.e.f.

2011)

CSHT 616 (IV) Data Mining Guidelines

Sr. No. of
Units Topics Chapter Reference
No. Lectures
1.1 - Why Data Mining? 1.2 - What Is Data Mining?,
1. Introduction 1 1 2
1.4 - What Kinds of Patterns Can Be Mined?
2.1- Data Objects and Attribute Types, 2.2 - basic
2. Data 2 1 2
Statistical Descriptions of Data (upto page 51)
Preprocessing
3. 2.2 – Data Quality, 2.3 – Data Preprocessing 2 2 6
Association 6.1 – Basic Concepts,6.2- Frequent Itemset Mining
4. 6 1 12
Analysis Methods (upto page 259)
4.1 – Preliminaries, 4.2 – General Approach to Solving
a Classification Problem, 4.3 Decision Tree Induction
5. 4 2 7
(Till Pg. 165), 4.5 – Evaluating the Performance of a
Classifier
Classification
5.1 – Rule Based Classifier (upto page 212),5.2 –
Analysis
Nearest Neighbor Classifiers, 5.3– Bayesian Classifiers
6. (Complete for discrete data and only introduction of 5 2 8
Bayes classifier for continuous attributes) till pg. 233,
5.7.1 – Alternative Metrics
10.1- Cluster Analysis, 10.2 - Partition Methods, 10.3 -
Cluster
7. Hierarchical Methods (uptopg 462), 10.4 - Density 10 1 11
Analysis
Based Methods (uptopg 473)

Course Books:

1. Data Mining: Concepts and Techniques, 3nd edition,Jiawei Han and MichelineKamber
2. Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson
Education.

References:

3. Data Mining: A Tutorial Based Primer, Richard Roiger, Michael Geatz, Pearson Education 2003.
4. Introduction to Data Mining with Case Studies, G.K. Gupta, PHI 2006
5. Insight into Data mining: Theory and Practice, Soman K. P., DiwakarShyam, Ajay V., PHI 2006
Practical List: Practical are to be done using Weka, and a report prepared as per the format*. The
operations are to be performed on built-in dummy data sets of weka and/or the downloadable datasets
mentioned in references below. Also wherever applicable, the parameter values are to be varied (upto
3 distinct values). The 'Visualize' tab is to be explored with each operation.

1. Preprocessing : Apply the following filters

weka>filter>supervised>attributed>

 AddClassification
 AttributeSelection
 Discretize
 NominalToBinary

weka>filter>supervised>instance

 StratifiedRemoveFolds
 Resample

weka>filter>unsupervised>attribute>

 Add
 AddExpression
 AddNoise
 Center
 Discretize
 MathExpression
 MergeTwoValues
 NominalToBinary
 NominalToString
 Normalize
 NumericToBinary
 NumericToNominal
 NumericTransform
 PrincipalComponent
 RandomSubset
 Remove
 RemoveType
 ReplaceMissingValues
 Standardize

weka>filter>unsupervised>instance>

 Normalize
 Randomize
 Standardize
 RemoveFrequentValues
 RemoveWithValues
 Resample
 SubsetByExpression

2. Explore the 'select attribute' as follows

weka>attributeSelection>

 FilteredSubsetEval
 WrapperSubsetEval

3. Association mining

weka>associations>

 Apriori
 FPGrowth

4. Classification**

weka>classifiers>bayes>

 NaïveBayes

weka>classifiers>lazy>

 IB1
 IBk

weka>classifiers>trees

 SimpleCart
 RandomTree
 ID3

5. Clustering**

weka>clusters>

 SimpleKMeans
 FarthestFirst algorithm
 DBSCAN
 hierarchicalClusterer
*Prescribed format:

Dataset Task Algorithm Filter Parameters Observations Inference Remarks

For all attributes,

The dataset has
Unsupervised -> mean =0 and
Iris Preprocessing ignoreClass: false been
Standardize standard deviation
standardized
=1

The minimum and

The dataset has
Unsupervised -> scale: 1.0, maximum of all
Iris Preprocessing been normalized
Normalize translation: 0.0 attributes is 0 and
in [0,1]
1 respectively

The minimum and

The dataset has
Unsupervised -> scale: 2.0, maximum of all
Iris Preprocessing been normalized
Normalize translation: 0.0 attributes is 0 and
in [0,2] range
2 respectively

** Proper graphs are to be drawn to compare the accuracies achieved by the variations mentioned below.

 Applying different algorithms to the same dataset.

 10%, 20%, 30%, 40% and 50% Noise.
 Applying different datasets to the same algorithm.

References for the data sets to be used for the experiments:

1. http://archive.ics.uci.edu/ml/
2. http://www.kdnuggets.com/datasets/index.html
3. https://wiki.csc.calpoly.edu/datasets/wiki/apriori (for Association Mining)

AP Environmental Science Practice Exam 1 Mcq
No ratings yet
AP Environmental Science Practice Exam 1 Mcq
4 pages
STO Owners Manual
No ratings yet
STO Owners Manual
410 pages
Capitolo1 - VORON 2.4r2
No ratings yet
Capitolo1 - VORON 2.4r2
18 pages
The Path To Satan
80% (56)
The Path To Satan
93 pages
ITS OD 201 Databases
100% (1)
ITS OD 201 Databases
2 pages
Mastering Spark SQL PDF
100% (1)
Mastering Spark SQL PDF
1,776 pages
Class4 MoreDataMiningWithWeka 2014 Old Version
No ratings yet
Class4 MoreDataMiningWithWeka 2014 Old Version
43 pages
CH2 Data Reduction
No ratings yet
CH2 Data Reduction
10 pages
15 Chapter6 PDF
No ratings yet
15 Chapter6 PDF
12 pages
Presentation 9
No ratings yet
Presentation 9
12 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Data Preprocessing
No ratings yet
Data Preprocessing
37 pages
Fahrizal Statistik Ipk
No ratings yet
Fahrizal Statistik Ipk
6 pages
Basic Data Prep and Pre-Processing (2)
No ratings yet
Basic Data Prep and Pre-Processing (2)
12 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
140 pages
Lecture 3 - MachineLearning-CrashCourse2023
No ratings yet
Lecture 3 - MachineLearning-CrashCourse2023
99 pages
Fortran Array and Pointer Techniques
100% (2)
Fortran Array and Pointer Techniques
176 pages
Binary Search Tree
No ratings yet
Binary Search Tree
1 page
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Overview of Clustering K Means
No ratings yet
Overview of Clustering K Means
8 pages
Expose Iot Data Mining Yagoub_semida
No ratings yet
Expose Iot Data Mining Yagoub_semida
19 pages
DAA(Design and Analysis of Algorithm) CSE
No ratings yet
DAA(Design and Analysis of Algorithm) CSE
115 pages
fda_a3_13642032.pdf
No ratings yet
fda_a3_13642032.pdf
19 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Final Stibo
No ratings yet
Final Stibo
25 pages
Data Mining - Lab 1
No ratings yet
Data Mining - Lab 1
4 pages
Poster
No ratings yet
Poster
2 pages
CH2 Data Cleaning
No ratings yet
CH2 Data Cleaning
41 pages
Weka Ex
No ratings yet
Weka Ex
3 pages
Total Documentation
No ratings yet
Total Documentation
21 pages
Chap 3.2 IntroductionToKeras
No ratings yet
Chap 3.2 IntroductionToKeras
36 pages
AI Penilaian Aset - MAPPI Rev2
No ratings yet
AI Penilaian Aset - MAPPI Rev2
51 pages
(eBook PDF) Data Mining Concepts and Techniques 3rdpdf download
No ratings yet
(eBook PDF) Data Mining Concepts and Techniques 3rdpdf download
42 pages
DM - 01 - 02 - Data Mining Functionalities PDF
No ratings yet
DM - 01 - 02 - Data Mining Functionalities PDF
63 pages
AI in Cybersecurity
No ratings yet
AI in Cybersecurity
48 pages
LTIMINDTREE INTERVIEW PREPARATIONS
No ratings yet
LTIMINDTREE INTERVIEW PREPARATIONS
7 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Activity Recognition Using Wearble Sensors
No ratings yet
Activity Recognition Using Wearble Sensors
17 pages
Core Data Data Storage and Management for iOS OS X and iCloud Second Edition Marcus S. Zarra instant download
100% (1)
Core Data Data Storage and Management for iOS OS X and iCloud Second Edition Marcus S. Zarra instant download
59 pages
CS3492 DBMS Key Notes
No ratings yet
CS3492 DBMS Key Notes
8 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Data Mining
No ratings yet
Data Mining
3 pages
Kernel Methods For General Pattern Analysis PDF
No ratings yet
Kernel Methods For General Pattern Analysis PDF
77 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
3 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Hubert L., Arable P., Meulman J. - Combinatorial Data Analysis. Optimization by Dynamic Programming
No ratings yet
Hubert L., Arable P., Meulman J. - Combinatorial Data Analysis. Optimization by Dynamic Programming
174 pages
DB Lab 2
No ratings yet
DB Lab 2
15 pages
lulu-0012_test-ml
No ratings yet
lulu-0012_test-ml
3 pages
Unit 1
No ratings yet
Unit 1
23 pages
query_optimization_part1
No ratings yet
query_optimization_part1
52 pages
Data Mining Algorithms in R PDF
No ratings yet
Data Mining Algorithms in R PDF
266 pages
Adsa at QB I
No ratings yet
Adsa at QB I
8 pages
LECTURE 2: The Object Model: Ivan Marsic Rutgers University
No ratings yet
LECTURE 2: The Object Model: Ivan Marsic Rutgers University
23 pages
DattaDeshmukhecs 2014 6892542
No ratings yet
DattaDeshmukhecs 2014 6892542
7 pages
Data Science
No ratings yet
Data Science
1 page
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Clustering 1
No ratings yet
Clustering 1
18 pages
Data Mining Disease Diagnosis Presentation
No ratings yet
Data Mining Disease Diagnosis Presentation
35 pages
S y B C A
No ratings yet
S y B C A
74 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches
From Everand
Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches
Jay A. Farrell
5/5 (1)
كتالوج وصفي للعملات الرومانية النادرة وغير المحررة من الفترة الأولى للعملات المعدنية الرومانية إلى انقراض الإمبراطورية تحت حك
No ratings yet
كتالوج وصفي للعملات الرومانية النادرة وغير المحررة من الفترة الأولى للعملات المعدنية الرومانية إلى انقراض الإمبراطورية تحت حك
570 pages
soil assignment
No ratings yet
soil assignment
7 pages
Bleeding After Tooth Extraction
No ratings yet
Bleeding After Tooth Extraction
3 pages
Minimus-Super Barebones RPG
100% (1)
Minimus-Super Barebones RPG
4 pages
Revelations of Sainte Marguerite Marie Alacoque
No ratings yet
Revelations of Sainte Marguerite Marie Alacoque
418 pages
Geophysical Exploration Seismic Refraction Test-Horizontal and Inclined Layered Soil
No ratings yet
Geophysical Exploration Seismic Refraction Test-Horizontal and Inclined Layered Soil
11 pages
CD+ 25-260 Product Description EN Antwerp API 146E 46L1 Ed 00 PDF
No ratings yet
CD+ 25-260 Product Description EN Antwerp API 146E 46L1 Ed 00 PDF
6 pages
Read The Text and Answer The Next Questions: The Oregon Weather Forecast
No ratings yet
Read The Text and Answer The Next Questions: The Oregon Weather Forecast
3 pages
Ocppgematc 8 Dyclmopur HQMC
No ratings yet
Ocppgematc 8 Dyclmopur HQMC
10 pages
Aluminium Sheet Grade 19500 Suppliers
No ratings yet
Aluminium Sheet Grade 19500 Suppliers
12 pages
14 Subsea Actuator March 2017 Compressed
No ratings yet
14 Subsea Actuator March 2017 Compressed
4 pages
Clean Architecture: A Craftsman's Guide To Software Structure and Design
No ratings yet
Clean Architecture: A Craftsman's Guide To Software Structure and Design
13 pages
Dillon RL550B Manual May 2007
No ratings yet
Dillon RL550B Manual May 2007
20 pages
Summary of Outturn Data For Petroleum Product Discharge
No ratings yet
Summary of Outturn Data For Petroleum Product Discharge
1 page
BITWeek7 - L10 - ITE2422 V1
No ratings yet
BITWeek7 - L10 - ITE2422 V1
11 pages
Reduced Fee Identification Card Eligibility
No ratings yet
Reduced Fee Identification Card Eligibility
1 page
Sound Data: Sound Pressure Levels at 7 Meters DB (A)
No ratings yet
Sound Data: Sound Pressure Levels at 7 Meters DB (A)
1 page
Boy s Club 1 1st Edition Matt Furie 2024 scribd download
100% (6)
Boy s Club 1 1st Edition Matt Furie 2024 scribd download
28 pages
Turning Our Crisis Into An Opportunity
No ratings yet
Turning Our Crisis Into An Opportunity
2 pages
PRJ Movie Recommendation Data Science..
No ratings yet
PRJ Movie Recommendation Data Science..
7 pages
HOW Planning LLP: 40 Peter Street Manchester M2 5GP 0161 835 1333
No ratings yet
HOW Planning LLP: 40 Peter Street Manchester M2 5GP 0161 835 1333
3 pages
Curriculum Vitae: Name: Present Position
No ratings yet
Curriculum Vitae: Name: Present Position
5 pages
Commentary: Anthropology & Medicine
No ratings yet
Commentary: Anthropology & Medicine
6 pages
IMC - Copy 1
No ratings yet
IMC - Copy 1
26 pages
Tubeshor 1060 Technical Datasheets
No ratings yet
Tubeshor 1060 Technical Datasheets
19 pages
A. Provider of Care: 1 - Explain The Roles of Gerontological Nurse As
No ratings yet
A. Provider of Care: 1 - Explain The Roles of Gerontological Nurse As
2 pages

Data Mining Guidelines

Uploaded by

Data Mining Guidelines

Uploaded by

B.Sc. (Hons.) Computer Science(w.e.f.

CSHT 616 (IV) Data Mining Guidelines

1. Preprocessing : Apply the following filters

2. Explore the 'select attribute' as follows

Dataset Task Algorithm Filter Parameters Observations Inference Remarks

For all attributes,

The minimum and

The minimum and

 Applying different algorithms to the same dataset.

References for the data sets to be used for the experiments:

You might also like