0% found this document useful (0 votes)

92 views

Data Mining With Weka

This document provides an overview of lessons from a course on data mining with Weka. It summarizes Lesson 3.1 which discusses the OneR algorithm, a simple classification method where one attribute does all the work. Lesson 3.2 covers the concept of overfitting, where a model fits the training data too closely and does not generalize well to new data. Lesson 3.3 introduces the Naive Bayes classifier which makes the assumption that attributes are statistically independent given the class value.

Uploaded by

Aakash Goliyan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views

Data Mining With Weka

Uploaded by

Aakash Goliyan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Data Mining with Weka

Class 3 – Lesson 1
Simplicity first!

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.1 Simplicity first!

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.1 Simplicity first!

Simple algorithms often work very well!

 There are many kinds of simple structure, eg:
– One attribute does all the work Lessons 3.1, 3.2
– Attributes contribute equally and independently Lesson 3.3
– A decision tree that tests a few attributes Lessons 3.4, 3.5
– Calculate distance from training instances Lesson 3.6
– Result depends on a linear combination of attributes Class 4

 Success of method depends on the domain
– Data mining is an experimental science
Lesson 3.1 Simplicity first!

OneR: One attribute does all the work

 Learn a 1‐level “decision tree”
– i.e., rules that all test one particular attribute

 Basic version
– One branch for each value
– Each branch assigns most frequent class
– Error rate: proportion of instances that don’t belong to the
majority class of their corresponding branch
– Choose attribute with smallest error rate
Lesson 3.1 Simplicity first!

For each attribute,

For each value of the attribute,
make a rule as follows:
count how often each class appears
find the most frequent class
make the rule assign that class
to this attribute-value
Calculate the error rate of this attribute’s rules
Choose the attribute with the smallest error rate
Lesson 3.1 Simplicity first!

Outlook Temp Humidity Wind Play Attribute Rules Errors Total

Sunny Hot High False No errors

Sunny Hot High True No Outlook Sunny  No 2/5 4/14

Overcast Hot High False Yes Overcast  Yes 0/4

Rainy Mild High False Yes Rainy  Yes 2/5

Rainy Cool Normal False Yes Temp Hot  No* 2/4 5/14

Rainy Cool Normal True No Mild  Yes 2/6

Overcast Cool Normal True Yes Cool  Yes 1/4

Sunny Mild High False No Humidity High  No 3/7 4/14

Sunny Cool Normal False Yes Normal  Yes 1/7

Rainy Mild Normal False Yes Wind False  Yes 2/8 5/14

Sunny Mild Normal True Yes True  No* 3/6

Overcast Mild High True Yes

* indicates a tie
Overcast Hot Normal False Yes
Rainy Mild High True No
Lesson 3.1 Simplicity first!

Use OneR

 Open file weather.nominal.arff
 Choose OneR rule learner (rules>OneR)
 Look at the rule (note: Weka runs OneR 11 times)
Lesson 3.1 Simplicity first!

OneR: One attribute does all the work
 Incredibly simple method, described in 1993
“Very Simple Classification Rules Perform Well on Most Commonly Used Datasets”
– Experimental evaluation on 16 datasets
– Used cross‐validation
– Simple rules often outperformed far more complex methods
 How can it work so well?
– some datasets really are simple
– some are so small/noisy/complex that nothing can be
learned from them!

Course text
Rob Holte,
 Section 4.1 Inferring rudimentary rules Alberta, Canada
Data Mining with Weka
Class 3 – Lesson 2
Overfitting

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.2 Overfitting

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.2 Overfitting

 Any machine learning method may “overfit” the training data …
… by producing a classifier that fits the training data too tightly
 Works well on training data but not on independent test data

 Remember the “User classifier”? Imagine tediously putting a tiny circle
around every single training data point

 Overfitting is a general problem
 … we illustrate it with OneR
Lesson 3.2 Overfitting

Numeric attributes

Outlook Temp Humidity Wind Play Attribute Rules Errors Total

errors
Sunny 85 85 False No
Temp 85  No 0/1 0/14
Sunny 80 90 True No
80  Yes 0/1
Overcast 83 86 False Yes
83  Yes 0/1
Rainy 75 80 False Yes
75  No 0/1
… … … … …
… …

 OneR has a parameter that limits the complexity of such rules
 How exactly does it work? Not so important …
Lesson 3.2 Overfitting

Experiment with OneR

 Open file weather.numeric.arff
 Choose OneR rule learner (rules>OneR)
 Resulting rule is based on outlook attribute, so remove outlook
 Rule is based on humidity attribute
humidity: < 82.5 ‐> yes
>= 82.5 ‐> no
(10/14 instances correct)
Lesson 3.2 Overfitting

Experiment with diabetes dataset

 Open file diabetes.arff
 Choose ZeroR rule learner (rules>ZeroR)
 Use cross‐validation: 65.1%
 Choose OneR rule learner (rules>OneR)
 Use cross‐validation: 72.1%
 Look at the rule (plas = plasma glucose concentration)
 Change minBucketSize parameter to 1: 54.9%
 Evaluate on training set: 86.6%
 Look at rule again
Lesson 3.2 Overfitting

 Overfitting is a general phenomenon that plagues all ML methods
 One reason why you must never evaluate on the training set
 Overfitting can occur more generally
 E.g try many ML methods, choose the best for your data
– you cannot expect to get the same performance on new test data
 Divide data into training, test, validation sets?

Course text
 Section 4.1 Inferring rudimentary rules
Data Mining with Weka
Class 3 – Lesson 3
Using probabilities

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.3 Using probabilities

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.3 Using probabilities

(OneR: One attribute does all the work)
Opposite strategy: use all the attributes
“Naïve Bayes” method
 Two assumptions: Attributes are
– equally important a priori
– statistically independent (given the class value)
i.e., knowing the value of one attribute says nothing about the
value of another (if the class is known)
 Independence assumption is never correct!
 But … often works well in practice
Lesson 3.3 Using probabilities

Probability of event H given evidence E
Pr[ E | H ] Pr[ H ]
Pr[ H | E ] 
Pr[ E ]
class instance
 Pr[ H ] is a priori probability of H
– Probability of event before evidence is seen
 Pr[ H | E ] is a posteriori probability of H
– Probability of event after evidence is seen
 “Naïve” assumption:
– Evidence splits into parts that are independent

Pr[ E1 | H ] Pr[ E 2 | H ]... Pr[ E n | H ] Pr[ H ]

Pr[ H | E ] 
Pr[ E ]

22 Thomas Bayes, British mathematician, 1702 –1761

Lesson 3.3 Using probabilities
Outlook Temperature Humidity Wind Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Outlook Temp Humidity Wind Play
Rainy 3/9 2/5 Cool 3/9 1/5 Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes

Pr[ E1 | H ] Pr[ E 2 | H ]... Pr[ E n | H ] Pr[ H ] Rainy Cool Normal True No

Pr[ H | E ]  Overcast Cool Normal True Yes
Pr[ E ] Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
Lesson 3.3 Using probabilities
Outlook Temperature Humidity Wind Play
Yes No Yes No Yes No Yes No Yes No
Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5
Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3
Rainy 3 2 Cool 3 1
Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14
Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5
Rainy 3/9 2/5 Cool 3/9 1/5

Outlook Temp. Humidity Wind Play

A new day: Sunny Cool High True ?

Likelihood of the two classes

Pr[ E1 | H ] Pr[ E 2 | H ]... Pr[ E n | H ] Pr[ H ] For “yes” = 2/9  3/9  3/9  3/9  9/14 = 0.0053
Pr[ H | E ] 
Pr[ E ] For “no” = 3/5  1/  4/5  3/5  5/14 = 0.0206
Conversion into a probability by normalization:
P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205
P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795
Lesson 3.3 Using probabilities

Outlook Temp. Humidity Wind Play

Sunny Cool High True ?
Evidence E

Pr[ yes | E ]  Pr[Outlook  Sunny | yes]

 Pr[Temperature  Cool | yes]
 Pr[ Humidity  High | yes]
Probability of
class “yes”  Pr[Windy  True | yes]
Pr[ yes]

Pr[ E ]

 93  93  93  149
2
 9
Pr[ E ]
Lesson 3.3 Using probabilities

Use Naïve Bayes

 Open file weather.nominal.arff
 Choose Naïve Bayes method (bayes>NaiveBayes)
 Look at the classifier
 Avoid zero frequencies: start all counts at 1
Lesson 3.3 Using probabilities

 “Naïve Bayes”: all attributes contribute equally and independently
 Works surprisingly well
– even if independence assumption is clearly violated
 Why?
– classification doesn’t need accurate probability estimates
so long as the greatest probability is assigned to the correct class
 Adding redundant attributes causes problems
(e.g. identical attributes)  attribute selection

Course text
 Section 4.2 Statistical modeling
Data Mining with Weka
Class 3 – Lesson 4
Decision trees

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.4 Decision trees

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.4 Decision trees

Top‐down: recursive divide‐and‐conquer
 Select attribute for root node
– Create branch for each possible attribute value
 Split instances into subsets
– One for each branch extending from the node

 Repeat recursively for each branch
– using only instances that reach the branch
 Stop
– if all instances have the same class
Lesson 3.4 Decision trees

Which attribute to select?
Lesson 3.4 Decision trees

Which is the best attribute?
 Aim: to get the smallest tree
 Heuristic
– choose the attribute that produces the “purest” nodes
– I.e. the greatest information gain
 Information theory: measure information in bits
entropy( p1 , p 2 ,..., p n )   p1log p1  p 2 log p 2 ...  p n log p n

Information gain
 Amount of information gained by knowing the value of the attribute
 (Entropy of distribution before the split) – (entropy of distribution after it)

Claude Shannon, American mathematician and scientist 1916–2001

Lesson 3.4 Decision trees

Which attribute to select?

0.247 bits 0.048 bits 0.152 bits 0.029 bits

Lesson 3.4 Decision trees

Continue to split …

gain(temperature) = 0.571 bits
gain(windy) = 0.020 bits
gain(humidity) = 0.971 bits
Lesson 3.4 Decision trees

Use J48 on the weather data

 Open file weather.nominal.arff
 Choose J48 decision tree learner (trees>J48)
 Look at the tree
 Use right‐click menu to visualize the tree
Lesson 3.4 Decision trees

 J48: “top‐down induction of decision trees”
 Soundly based in information theory
 Produces a tree that people can understand
 Many different criteria for attribute selection
– rarely make a large difference
 Needs further modification to be useful in practice
(next lesson)

Course text
 Section 4.3 Divide‐and‐conquer: Constructing decision trees
Data Mining with Weka
Class 3 – Lesson 5
Pruning decision trees

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.5 Pruning decision trees

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.5 Pruning decision trees
Lesson 3.5 Pruning decision trees

Highly branching attributes — Extreme case: ID code
ID code Outlook Temp Humidity Wind Play
a Sunny Hot High False No
b Sunny Hot High True No
c Overcast Hot High False Yes
d Rainy Mild High False Yes
e Rainy Cool Normal False Yes
f Rainy Cool Normal True No
g Overcast Cool Normal True Yes
h Sunny Mild High False No
i Sunny Cool Normal False Yes Information gain is maximal
j Rainy Mild Normal False Yes (0.940 bits)
k Sunny Mild Normal True Yes
l Overcast Mild High True Yes
m Overcast Hot Normal False Yes
n Rainy Mild High True No
Lesson 3.5 Pruning decision trees

How to prune?

 Don’t continue splitting if the nodes get very small
(J48 minNumObj parameter, default value 2)
 Build full tree and then work back from the leaves, applying a
statistical test at each stage
(confidenceFactor parameter, default value 0.25)
 Sometimes it’s good to prune an interior node, raising the
subtree beneath it up one level
(subtreeRaising, default true)
 Messy … complicated … not particularly illuminating
Lesson 3.5 Pruning decision trees

Over‐fitting (again!)
Sometimes simplifying a decision tree gives better results
 Open file diabetes.arff
 Choose J48 decision tree learner (trees>J48)
 Prunes by default: 73.8% accuracy, tree has 20 leaves, 39 nodes
 Turn off pruning: 72.7% 22 leaves, 43 nodes
 Extreme example: breast‐cancer.arff
 Default (pruned): 75.5% accuracy, tree has 4 leaves, 6 nodes
 Unpruned: 69.6% 152 leaves, 179 nodes
Lesson 3.5 Pruning decision trees

 C4.5/J48 is a popular early machine learning method
 Many different pruning methods
– mainly change the size of the pruned tree
 Pruning is a general technique that can apply to
structures other than trees (e.g. decision rules)
 Univariate vs. multivariate decision trees
– Single vs. compound tests at the nodes
 From C4.5 to J48 (recall Lesson 1.4)

Course text Ross Quinlan,

 Section 6.1 Decision trees Australian computer scientist
Data Mining with Weka
Class 3 – Lesson 6
Nearest neighbor

Ian H. Witten
Department of Computer Science
University of Waikato
New Zealand

weka.waikato.ac.nz
Lesson 3.6 Nearest neighbor

Class 1
Getting started with Weka

Lesson 3.1 Simplicity first!
Class 2
Evaluation
Lesson 3.2 Overfitting

Class 3
Simple classifiers Lesson 3.3 Using probabilities

Lesson 3.4 Decision trees
Class 4
More classifiers
Lesson 3.5 Pruning decision trees
Class 5
Putting it all together Lesson 3.6 Nearest neighbor
Lesson 3.6 Nearest neighbor

“Rote learning”: simplest form of learning

 To classify a new instance, search training set for one
that’s “most like” it
– the instances themselves represent the “knowledge”
– lazy learning: do nothing until you have to make predictions
 “Instance‐based” learning = “nearest‐neighbor” learning
Lesson 3.6 Nearest neighbor
Lesson 3.6 Nearest neighbor

Search training set for one that’s “most like” it

 Need a similarity function
– Regular (“Euclidean”) distance? (sum of squares of differences)
– Manhattan (“city‐block”) distance? (sum of absolute differences)
– Nominal attributes? Distance = 1 if different, 0 if same
– Normalize the attributes to lie between 0 and 1?
Lesson 3.6 Nearest neighbor

What about noisy instances?

 Nearest‐neighbor
 k‐nearest‐neighbors
– choose majority class among several neighbors (k of them)
 In Weka,
lazy>IBk (instance‐based learning)
Lesson 3.6 Nearest neighbor

Investigate effect of changing k

 Glass dataset

 lazy > IBk, k = 1, 5, 20

 10‐fold cross‐validation
k = 1 k = 5 k = 20
70.6% 67.8% 65.4%
Lesson 3.6 Nearest neighbor

 Often very accurate … but slow:
– scan entire training data to make each prediction?
– sophisticated data structures can make this faster
 Assumes all attributes equally important
– Remedy: attribute selection or weights
 Remedies against noisy instances:
– Majority vote over the k nearest neighbors
– Weight instances according to prediction accuracy
– Identify reliable “prototypes” for each class
 Statisticians have used k‐NN since 1950s
– If training set size n   and k   and k/n  0, error approaches minimum

Course text
 Section 4.7 Instance‐based learning
Data Mining with Weka
Department of Computer Science
University of Waikato
New Zealand

Creative Commons Attribution 3.0 Unported License

creativecommons.org/licenses/by/3.0/

weka.waikato.ac.nz

LEED BD+C v4 Reference Guide
90% (79)
LEED BD+C v4 Reference Guide
817 pages
Revising Prose Richard Lanham PDF
100% (11)
Revising Prose Richard Lanham PDF
148 pages
5 lb. Book of GRE Practice Problems: 1,400+ Practice Problems in Book and Online (Manhattan Prep 5 lb)
From Everand
5 lb. Book of GRE Practice Problems: 1,400+ Practice Problems in Book and Online (Manhattan Prep 5 lb)
Manhattan Prep
No ratings yet
Data Analysis Using WEKA
88% (8)
Data Analysis Using WEKA
24 pages
320B All PDF
No ratings yet
320B All PDF
43 pages
Lab (I)
No ratings yet
Lab (I)
3 pages
Practical Design of Experiments: DoE Made Easy
From Everand
Practical Design of Experiments: DoE Made Easy
Colin Hardwick
4.5/5 (7)
Siterra Executive Overview
0% (1)
Siterra Executive Overview
2 pages
Class3 DataMiningWithWeka 2013
No ratings yet
Class3 DataMiningWithWeka 2013
49 pages
Class5 DataMiningWithWeka 2013
No ratings yet
Class5 DataMiningWithWeka 2013
31 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
127 pages
01 DM HAI Class1 2019 09 05
No ratings yet
01 DM HAI Class1 2019 09 05
77 pages
Data Mining With Weka: Ian H. Witten
No ratings yet
Data Mining With Weka: Ian H. Witten
33 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
More Data Mining With Weka: Ian H. Witten
No ratings yet
More Data Mining With Weka: Ian H. Witten
61 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
HAI C-06 Jueves 15-10-2020
No ratings yet
HAI C-06 Jueves 15-10-2020
34 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
A Simple Introduction: To Weka
No ratings yet
A Simple Introduction: To Weka
83 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Data Mining With Weka
No ratings yet
Data Mining With Weka
1 page
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
An Introduction To WEKA: Contributed by Yizhou Sun 2008
No ratings yet
An Introduction To WEKA: Contributed by Yizhou Sun 2008
85 pages
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
No ratings yet
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
17 pages
Syllabus - Data Mining Solution With Weka
No ratings yet
Syllabus - Data Mining Solution With Weka
5 pages
An Introduction To WEKA
No ratings yet
An Introduction To WEKA
85 pages
Lab3-Form
No ratings yet
Lab3-Form
5 pages
Class4 MoreDataMiningWithWeka 2014 Old Version
No ratings yet
Class4 MoreDataMiningWithWeka 2014 Old Version
43 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Data Mining Guidelines
No ratings yet
Data Mining Guidelines
4 pages
Data Mining With Weka: Department of Computer Science University of Waikato New Zealand
No ratings yet
Data Mining With Weka: Department of Computer Science University of Waikato New Zealand
5 pages
Lab 01-PhamBinhDuong ITCSIU21054
No ratings yet
Lab 01-PhamBinhDuong ITCSIU21054
9 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Iare DWDM and WT Lab Manual PDF
No ratings yet
Iare DWDM and WT Lab Manual PDF
69 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
WEKA: The Bird: Machine Learning With Weka
No ratings yet
WEKA: The Bird: Machine Learning With Weka
87 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Data Mining With Weka - Demo
No ratings yet
Data Mining With Weka - Demo
12 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Data Mining - Paragraphs
No ratings yet
Data Mining - Paragraphs
5 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Data Warehouse and Data Mining: Lab Manual
100% (1)
Data Warehouse and Data Mining: Lab Manual
69 pages
More Data Mining With Weka: Ian H. Witten
No ratings yet
More Data Mining With Weka: Ian H. Witten
47 pages
Wekappt
No ratings yet
Wekappt
58 pages
DMDV_210
No ratings yet
DMDV_210
63 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Template-data_mining
No ratings yet
Template-data_mining
3 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
11 pages
Introduction To Weka
No ratings yet
Introduction To Weka
38 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
dwdm_file-final_ver3.pdf_20241230_172003_0000
No ratings yet
dwdm_file-final_ver3.pdf_20241230_172003_0000
54 pages
Data Mining Practical (1)
No ratings yet
Data Mining Practical (1)
31 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
High School Pre-Calculus Tutor
From Everand
High School Pre-Calculus Tutor
The Editors of REA
4/5 (1)
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
From Everand
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
Dwayne Phillips
No ratings yet
New Concept Mathematics
From Everand
New Concept Mathematics
Brown Mark
No ratings yet
A First Look at Perturbation Theory
From Everand
A First Look at Perturbation Theory
James G. Simmonds
4.5/5 (3)
An In-Depth Investigation of The Divine Ratio
No ratings yet
An In-Depth Investigation of The Divine Ratio
20 pages
CoV - Bulletin Green Buildings Policy For Rezoning
No ratings yet
CoV - Bulletin Green Buildings Policy For Rezoning
4 pages
Dallas Green Building - Commerical Checklist New Construction
No ratings yet
Dallas Green Building - Commerical Checklist New Construction
13 pages
COD Chapter 61 Green Building Code - 7-25-18
No ratings yet
COD Chapter 61 Green Building Code - 7-25-18
35 pages
Intelligent Energy Systems
No ratings yet
Intelligent Energy Systems
81 pages
Building Energy Modeling For Owners and Managers
No ratings yet
Building Energy Modeling For Owners and Managers
3 pages
ASHRAE140 Envelope 8.3.0 b45b06b780
No ratings yet
ASHRAE140 Envelope 8.3.0 b45b06b780
58 pages
Specs: Envelope Compliance Guide
No ratings yet
Specs: Envelope Compliance Guide
8 pages
Bar Design Guide PDF
No ratings yet
Bar Design Guide PDF
8 pages
Microbiology
No ratings yet
Microbiology
3 pages
PDF Bringing Indians to the Book Emil and Kathleen Sick Series in Western History and Biography 1st Edition Albert Furtwangler download
100% (4)
PDF Bringing Indians to the Book Emil and Kathleen Sick Series in Western History and Biography 1st Edition Albert Furtwangler download
81 pages
4.1 06 - Areas Related To A Circle Solved Questions PDF
No ratings yet
4.1 06 - Areas Related To A Circle Solved Questions PDF
11 pages
Inside GEs Transformation
No ratings yet
Inside GEs Transformation
22 pages
Applied Electronics (Mid-Term Presentation) K-18EL101
No ratings yet
Applied Electronics (Mid-Term Presentation) K-18EL101
9 pages
Ted Talk Presentation Rubric
100% (3)
Ted Talk Presentation Rubric
2 pages
SOME Catalogo Arranques
No ratings yet
SOME Catalogo Arranques
80 pages
Multi-Layer Composite Pipe: Technical Standards and Approvals
No ratings yet
Multi-Layer Composite Pipe: Technical Standards and Approvals
10 pages
HR PR 02 Position Description
No ratings yet
HR PR 02 Position Description
2 pages
How To Build A Giant Jenga Set
No ratings yet
How To Build A Giant Jenga Set
8 pages
Mapping Course Syllabus
No ratings yet
Mapping Course Syllabus
16 pages
Shark Cage Diving
No ratings yet
Shark Cage Diving
5 pages
Methods of Enhancing Wood and Bamboo Products
No ratings yet
Methods of Enhancing Wood and Bamboo Products
4 pages
Diana Stormrider PY-4449-8647 Manual
No ratings yet
Diana Stormrider PY-4449-8647 Manual
12 pages
6 Robot Kinematics and Dynamics 1 PDF
No ratings yet
6 Robot Kinematics and Dynamics 1 PDF
45 pages
CCNA Cheat Sheet
No ratings yet
CCNA Cheat Sheet
17 pages
Cisco 220 Series Smart Switches
No ratings yet
Cisco 220 Series Smart Switches
13 pages
Generic Geared Reducers
No ratings yet
Generic Geared Reducers
12 pages
BS in Radiologic Technology
No ratings yet
BS in Radiologic Technology
12 pages
icrc_position_on_aws_and_background_paper
No ratings yet
icrc_position_on_aws_and_background_paper
12 pages
Pakistan's Relations With Central Asian Countries
No ratings yet
Pakistan's Relations With Central Asian Countries
3 pages
Edc 274 Signature Assignment Outline
No ratings yet
Edc 274 Signature Assignment Outline
3 pages
Boss Plow Manual
No ratings yet
Boss Plow Manual
32 pages
Adaptive Filter Notes
0% (1)
Adaptive Filter Notes
26 pages
I. Preferred Investment Project List
No ratings yet
I. Preferred Investment Project List
17 pages
Cashflow Assignment
No ratings yet
Cashflow Assignment
6 pages
TMI Data 300311
No ratings yet
TMI Data 300311
19 pages
Alpha Strike Fun Bois
No ratings yet
Alpha Strike Fun Bois
11 pages

Data Mining With Weka

Uploaded by

Data Mining With Weka

Uploaded by

Data Mining with Weka

For each attribute,

Outlook Temp Humidity Wind Play Attribute Rules Errors Total

Sunny Hot High True No Outlook Sunny  No 2/5 4/14

Overcast Hot High False Yes Overcast  Yes 0/4

Rainy Mild High False Yes Rainy  Yes 2/5

Rainy Cool Normal True No Mild  Yes 2/6

Overcast Cool Normal True Yes Cool  Yes 1/4

Sunny Mild High False No Humidity High  No 3/7 4/14

Sunny Cool Normal False Yes Normal  Yes 1/7

Sunny Mild Normal True Yes True  No* 3/6

Overcast Mild High True Yes

Outlook Temp Humidity Wind Play Attribute Rules Errors Total

Pr[ E1 | H ] Pr[ E 2 | H ]... Pr[ E n | H ] Pr[ H ]

22 Thomas Bayes, British mathematician, 1702 –1761

Pr[ E1 | H ] Pr[ E 2 | H ]... Pr[ E n | H ] Pr[ H ] Rainy Cool Normal True No

Outlook Temp. Humidity Wind Play

Likelihood of the two classes

Outlook Temp. Humidity Wind Play

Pr[ yes | E ]  Pr[Outlook  Sunny | yes]

Claude Shannon, American mathematician and scientist 1916–2001

0.247 bits 0.048 bits 0.152 bits 0.029 bits

Course text Ross Quinlan,

You might also like