0% found this document useful (0 votes)
219 views13 pages

Getting Started with WEKA Data Mining

The document provides steps to get started with data mining using the WEKA tool: 1) Open WEKA and select the Explorer tool for preprocessing, classification, and clustering tasks. 2) Import a CSV dataset and transform attributes from numeric to nominal values using a filter. 3) Split the dataset into 70% for training and 30% for testing using randomization and percentage filters. 4) Train classification models like OneR using cross-validation on the training set and evaluate on the test set. 5) Perform association rule learning using the Apriori algorithm to generate rules predicting class attributes.

Uploaded by

giovanni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
219 views13 pages

Getting Started with WEKA Data Mining

The document provides steps to get started with data mining using the WEKA tool: 1) Open WEKA and select the Explorer tool for preprocessing, classification, and clustering tasks. 2) Import a CSV dataset and transform attributes from numeric to nominal values using a filter. 3) Split the dataset into 70% for training and 30% for testing using randomization and percentage filters. 4) Train classification models like OneR using cross-validation on the training set and evaluate on the test set. 5) Perform association rule learning using the Apriori algorithm to generate rules predicting class attributes.

Uploaded by

giovanni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DataminingwithWEKA

Ausecasetohelpyougetstarted

Charalampos Mavroforakis
BUCS105,Fall2011
StartingWEKA

OpenWeka :Start>AllPrograms>Weka 3.x.x>Weka 3.x


Fromthe"Weka GUIChooser",pick"Explorer".Thisisthe
mainWEKAtoolthatwearegoingtouse.
Openingadataset

Toopenadataset([Link] fileinourcase),weclick"Openfile..."inthe
Preprocess [Link] thatin
theopenmenuyouhavetochoosecsv [Link]
[Link]
Transformingvaluestonominal(ifneeded)

Weka classifiedeveryattributeinourdatasetasnumeric,sowehavetomanuallytransform
[Link],[Link],whichisin
Unsupervised >[Link],[Link],the
mostinterestingonehereistheattributeIndices,whichenumeratesalltheattributesthatyou
[Link],weclickApply.
Splittingthedataset

Wehavetosplitthedatasetintotwo,30%testingand70%[Link],wefirst
Randomize thedataset(Unsupervised >Instance),sothatwecreatearandompermutation.
Splittingthedataset

ThenweapplyRemovePercentage (Unsupervised >Instance)withpercentage30andsavethe


resultingdatasetastraining.
Splittingthedataset

Afterthat,weundoandapplythesamefilterchoosinginvertSelection [Link]
therestofthedata(30%)sowesavethemasthetesting.
Trainingmodels

[Link]"Classify"andwe
[Link]'sstartwithOneR,whichisthesamewiththeonewesawintheclass.
Trainingmodels

[Link]
wanttoseehowgoodOneR isasamodel,soweusecrossvalidation.,andonlyafterthat
willwegoandcheckwhatitpredictsontheunseendata.
Trainingmodels

Intheoutput,wegetinformationabouttheaverageaccuracyandtheconfusionmatrixof
ourmodel.
Trainingmodels

Inordertocheckhowwellwedoontheunseendata,weselect"supplied test set",weopen


[Link]
thealgorithmagainandwenoticethedifferencesintheconfusionmatrixandtheaccuracy.
Associationlearning

Ifallofourattributesarenominal(incasetheyarenot,wecandiscretizetheminthe
Preprocesstab)[Link],weswitchtothe
Association tabandwechoosetheApriori [Link]
parametersifyouwant.
Associationlearning

Wecouldsetcar toTrue(sothatitproducesrulesthatpredicttheclassattribute)and
[Link] setsthe
thresholdofconfidenceandnumRules [Link]
resultwillbeasetofrulesthatpredicttheclass,togetherwiththeirconfidence.

You might also like