Tutorial Weka - Feature Selection and Classification, Data Mining
Tutorial Weka - Feature Selection and Classification, Data Mining
5. To make the scenario of feature selection / attribute selection, the first step you must
measure the accuracy performance using classification (before feature selection) to
know the accuracy of data.
Click tab Classify weka classifiers bayes NaiveBayes
As you know there are four popular algorithms in data mining
After you choose one algorithm (for example: Nave Bayes), you can click start button to
execute its algorithm.
You can see on the figure above that is accuracy performance of Nave Bayes algorithm to
measure breast cancer dataset. You can read my paper, I used 4 parameter performance which
it is recommended by several researcher to use Accuracy (TP rate), Precision, Recall, and
F-measure. You can search the meaning of them. As example, we know the accuracy
performance of breast cancer dataset is 96%.
6. After we known the accuracy performance of original breast cancer dataset (96%), we
want to measure how the effect of feature selection. You can follow this instruction:
The figure above explain that we choose Ant Colony (AntSearch) as feature selection
algorithm.
7. We can click Start button to execute its algorithm, and then we get the result.
The figure above explain that all the feature of breast cancer dataset based on evaluation by
Ant Colony algorithm as feature selection is important, because its algorithm choose 9
features as the important features and have high contribution from 9 existing features in data.
In other case (other dataset), you will get the condition which Ant colony algorithm will
choose 533 feature as important features from 2001 existing features (you can read my
paper).
8. For example, if you get the condition/case that Ant Colony algorithm choose 5
features of 9 existing feature, you can remove the other 4 features in data.
The figure above explain that you can check in the feature column to remove which
feature is not important in data based on feature selection result, and then click
Remove button to execute.
9. And last step, you can measure the accuracy performance of your data which they are
feature selection data (not original data), and then you apply the classification
algorithm using Nave Bayes classifiers.