DataminingwithWEKA
Ausecasetohelpyougetstarted
Charalampos Mavroforakis
BUCS105,Fall2011
StartingWEKA
OpenWeka :Start>AllPrograms>Weka 3.x.x>Weka 3.x
Fromthe"Weka GUIChooser",pick"Explorer".Thisisthe
mainWEKAtoolthatwearegoingtouse.
Openingadataset
Toopenadataset([Link] fileinourcase),weclick"Openfile..."inthe
Preprocess [Link] thatin
theopenmenuyouhavetochoosecsv [Link]
[Link]
Transformingvaluestonominal(ifneeded)
Weka classifiedeveryattributeinourdatasetasnumeric,sowehavetomanuallytransform
[Link],[Link],whichisin
Unsupervised >[Link],[Link],the
mostinterestingonehereistheattributeIndices,whichenumeratesalltheattributesthatyou
[Link],weclickApply.
Splittingthedataset
Wehavetosplitthedatasetintotwo,30%testingand70%[Link],wefirst
Randomize thedataset(Unsupervised >Instance),sothatwecreatearandompermutation.
Splittingthedataset
ThenweapplyRemovePercentage (Unsupervised >Instance)withpercentage30andsavethe
resultingdatasetastraining.
Splittingthedataset
Afterthat,weundoandapplythesamefilterchoosinginvertSelection [Link]
therestofthedata(30%)sowesavethemasthetesting.
Trainingmodels
[Link]"Classify"andwe
[Link]'sstartwithOneR,whichisthesamewiththeonewesawintheclass.
Trainingmodels
[Link]
wanttoseehowgoodOneR isasamodel,soweusecrossvalidation.,andonlyafterthat
willwegoandcheckwhatitpredictsontheunseendata.
Trainingmodels
Intheoutput,wegetinformationabouttheaverageaccuracyandtheconfusionmatrixof
ourmodel.
Trainingmodels
Inordertocheckhowwellwedoontheunseendata,weselect"supplied test set",weopen
[Link]
thealgorithmagainandwenoticethedifferencesintheconfusionmatrixandtheaccuracy.
Associationlearning
Ifallofourattributesarenominal(incasetheyarenot,wecandiscretizetheminthe
Preprocesstab)[Link],weswitchtothe
Association tabandwechoosetheApriori [Link]
parametersifyouwant.
Associationlearning
Wecouldsetcar toTrue(sothatitproducesrulesthatpredicttheclassattribute)and
[Link] setsthe
thresholdofconfidenceandnumRules [Link]
resultwillbeasetofrulesthatpredicttheclass,togetherwiththeirconfidence.