Title: WEKA SVEC/CSE/EXPT-DWDM
SREE VIDYANIKETHAN ENGINEERING COLLEGE(AUTONOMOUS)
Sree Sainath Nagar, A. Rangampet – 517 102
Department of Computer Science and Engineering
III B. Tech – II Semester
DATA WAREHOUSING AND DATA MINING LAB (16BT61531)
Title: WEKA SVEC/CSE/EXPT-DWDM
WEKA----WEEK 1
Aim:
To Pre-process the data in weka with a simple experiments
a) Handling missing data (both nomial and numerical)
b) All types normalization (min-max, z-score, decimal scaling)
c) Sampling.
DESCRIPTION:
A) Mark Missing Values
1. Open the Weka Explorer.
2. Load the Pima Indians onset of diabetes dataset.
3. Click the “Choose” button for the Filter and select NumericalCleaner, it us under
unsupervized.attribute.NumericalCleaner.
Weka Select Numeric Cleaner Data Filter
4. Click on the filter to configure it.
5. Set the attributeIndicies to 6, the index of the mass attribute.
6. Set minThreshold to 0.1E-8 (close to zero), which is the minimum value allowed for the attribute.
7. Set minDefault to NaN, which is unknown and will replace values below the threshold.
8. Click the “OK” button on the filter configuration.
9. Click the “Apply” button to apply the filter.
Click “mass” in the “attributes” pane and review the details of the “selected attribute”. Notice that the 11
attribute values that were formally set to 0 are not marked as Missing.
Weka Missing Data Marked
In this example we marked values below a threshold as missing.
You could just as easily mark them with a specific numerical value. You could also
mark values missing between a upper and lower range of values.
Next, let’s look at how we can remove instances with missing values from our
dataset.
Remove Missing Data
Now that you know how to mark missing values in your data, you need to learn
how to handle them.
A simple way to handle missing data is to remove those instances that have one
or more missing values.
You can do this in Weka using the RemoveWithValues filter.
Continuing on from the above recipe to mark missing values, you can remove
missing values as follows:
1. Click the “Choose” button for the Filter and select RemoveWithValues, it us
under unsupervized.instance.RemoveWithValues.
Weka Select RemoveWithValues Data Filter
2. Click on the filter to configure it.
3. Set the attributeIndicies to 6, the index of the mass attribute.
4. Set matchMissingValues to “True”.
5. Click the “OK” button to use the configuration for the filter.
6. Click the “Apply” button to apply the filter.
Click “mass” in the “attributes” section and review the details of the “selected
attribute”.
Notice that the 11 attribute values that were marked Missing have been removed
from the dataset.
Weka Missing Values Removed
Note, you can undo this operation by clicking the “Undo” button.
Impute Missing Values
Instances with missing values do not have to be removed, you can replace
the missing values with some other value.
This is called imputing missing values.
It is common to impute missing values with the mean of the numerical
distribution. You can do this easily in Weka using the ReplaceMissingValues
filter.
Continuing on from the first recipe above to mark missing values, you can
impute the missing values as follows:
1. Click the “Choose” button for the Filter and select Replace Missing
Values, it us under unsupervized.attribute. ReplaceMissingValues
Weka ReplaceMissingValues Data Filter
2. Click the “Apply” button to apply the filter to your dataset.
Click “mass” in the “attributes” section and review the details of the “selected attribute”.
Notice that the 11 attribute values that were marked Missing have been set to the mean value of the
distribution.
Weka Imputed Values
EXPERIMENT-2
Aim: To create a Decision tree by training data set using Weka mining tool.
Tools/ Apparatus: Weka mining tool..
mbinations of values in the historical data.
Procedure:
1) Open Weka GUI Chooser.
2) Select EXPLORER present in Applications.
3) Select Preprocess Tab.
4) Go to OPEN file and browse the file that is already stored in the system “bank.csv”.
5) Go to Classify tab.
6) Here the c4.5 algorithm has been chosen which is entitled as j48 in Java and can be selected by clicking
the button choose
7) and select tree j48
9) Select Test options “Use training set”
10) if need select attribute.
11) Click Start .
12)now we can see the output details in the Classifier output.
13) right click on the result list and select ” visualize tree “option .
Sample output:
The decision tree constructed by using the implemented C4.5 algorithm