0% found this document useful (0 votes)

24 views19 pages

131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal

Uploaded by

forlocalaccc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views19 pages

131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal

Uploaded by

forlocalaccc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Advances and Applications in Mathematical Sciences

Volume 19, Issue 6, April 2020, Pages 451-469

EXPLORING DATA MINING TOOL - WEKA AND USING

WEKA TO BUILD AND EVALUATE PREDICTIVE
MODELS

KANWAL PREET SINGH ATTWAL and AMARDEEP SINGH DHIMAN

Department of Computer Science and Engineering,

Punjabi University
Patiala, India
E-mail: [email protected]
[email protected]

Abstract

WEKA is a Java based software suite which implements a large number of machine
learning algorithms. It can be used for performing different Data Mining tasks such as Data
Preprocessing, Classification, Clustering, Association Rule mining and Visualization. This paper
is an attempt to acquaint the researcher with the WEKA interface. The study explores the
different file formats supported by WEKA and also discuss the methods of converting excel files
to WEKA supported file formats. The paper also illustrates how to use WEKA filters to divide
the dataset into two categories – the training dataset and the test dataset. The training dataset
is then used to build a predictive model and the test dataset is used to evaluate the performance
of the model. The study also shows how the model can be used for the prediction of the class of
those tuples/objects, the class label of which is not known.

1. Introduction

WEKA which stands for “Waikato Environment for Knowledge Analysis”

is a Java based open-source Data Mining Tool developed by the University of
Waikato. It is an amalgamation of Machine Learning Algorithms and Data
Preprocessing Tools. The implementations of Machine Learning Algorithms
are provided by the WEKA, which are easily applicable to the dataset.
Besides, various tools for transforming datasets – as the algorithms for
discretisation and sampling – are also available in WEKA. A dataset can be
preprocessed, fed into a learning scheme, and the resulting classifier as well
2010 Mathematics Subject Classification: 68P15.
Keywords: WEKA tool, Data Mining, Predictive Model, Classifier, ARFF file format, Model
Evaluation.
Received 30 October 2019; Accepted 22 December 2019
452 K. P. S. ATTWAL and A. S. DHIMAN

as its performance can be analysed. WEKA provides both – GUI and a

Command Line Interface to run these algorithms on your dataset. The user
does not have to write any code. WEKA requires Java installed on the
computer system. The latest stable version of WEKA –3.8–can be downloaded
from the link - http://www.cs.waikato.ac.nz/ml/weka/downloading.html. The
WEKA workbench includes implementation of algorithms for performing the
main data mining tasks: Regression, Classification, Clustering, Association
Rule Mining and Attribute Selection (Witten, Frank, and Hall, 2011). The
following table summarizes the different Data Mining Tasks that can be
performed in WEKA.

Table 1. Data Mining Tasks available in WEKA.

Data Mining Task Description Examples

Pre-Processing of Discretizing, Nominal to

Preparing a dataset for analysis
Data Binary

Identify groups (i.e., clusters) of

Clustering K-Means
similar observations

Bayes Net, KNN,

Examining the features of a newly
Decision Tree, Neural
Classification presented object and assigning a
Networks, Perceptron,
predefined class to it
SVM

Predicting numeric values for a

Linear Regression,
Regression particular attribute of a newly
Isotonic Regression
presented object

Association rule Finding relationships among values of Apriori Algorithm,

mining different attributes in a database Predictive Accuracy

Visually represent data mining Cluster assignments,

Visualization
results ROC curves

The modular, object-oriented architectural design of WEKA workbench

permits an easy addition of new classifiers, filters, clustering and other
algorithms. A set of abstract Java classes have been designed and placed in a
corresponding top-level package. This is done for every major component.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  453

All classifiers are included in subpackages of the top level “classifiers”

package and extend a “Classifier” which is a common base class. The
Classifier class recommends a public interface and a set of rules. According to
the functionality or the purpose, components are grouped together by sub-
packages. For instance, filters are first divided into supervised and
unsupervised filters, and further on the basis of their operations i.e. either it
is attribute based or instance based. Classifiers are arranged in accordance
with the general type of learning algorithm, hence there are subpackages for
Bayesian methods, tree inducers, rule learners, etc. (Maimon and Rokach,
2010).

“Core” which are supporting classes in a top level package is a

representation of attributes and instances. It is one of the many common
utility methods. The classes and data structures in this package read data
sets. All the components depend on these supporting classes. The core
package includes additional interfaces which may be implemented by the
components in order to support extra functionality.

When WEKA has been installed, weka-3-8 (or weka-version) folder is

found in Program Files, containing a weka.jar file. WEKA stores pre-written
packages, classes, and interfaces with their respective methods, fields and
constructors in the form of Java API. This API is stored in weka.jar file.

2. Supported File Formats

WEKA does not accept files in normal xlsor xlsx format. ARFF –
“Attribute-Relation File Format” is the default file format of WEKA, although
it also accepts files in comma-separated value (CSV) format, C4.5 format, etc.
ARFF file is an ASCII text file; it gives a list of instances that share a set of
attributes(Bouckaert, et al., 2016).The ARFF files have two distinct sections –
the Header Section and the Data Section. The details of the relation, the
attributes, and their types are contained in the reader of the ARFF file and
the Data Section contains the relevant data. Comments can be added by
using percent sign (%),the dataset’s name by using the @relation tag and the
attribute information by using @attributetag. The first line in the ARFF file is
the relation name. The format is:

@relation <relation-name>

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
454 K. P. S. ATTWAL and A. S. DHIMAN

where<relation-name> is a string. If the name contains spaces, the string

should be quoted.

The format for the @attribute statement is:

@attribute <attribute-name><datatype>

Weka supports four types of data:

 Numeric

 Nominal

 string

 date

Numeric attributes may either be real or integer numbers.

Nominal attributes are used when the value of an attribute is chosen

from a pre-defined list. A nominal attribute is defined by providing a
“nominal-specification” listing the possible values as: <nominal-name1>,
<nominal-name2>, <nominal-name3>, ……

For Iris dataset, the class value is defined as follows:

@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

Values that contain spaces must be quoted.

String is used to create attributes that contain arbitrary text values.

Date attribute is used to store dates in different formats. The declarations

take the following form:

@attribute <name> date [<date-format>]

Here<name> is the name for the attribute; <date-format> is an optional

string which specifies the way of parsing and printing the date values. The
default dates should be specified in the data section as the corresponding
string representations of the date/time.

The ARFF Data Section of the file contains the data declaration line and
the actual instance lines. A detailed description of ARFF files can be obtained
from the Data section of WEKA documentation at the download of WEKA.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  455

2.1. Conversion from xls/xlsv to csv format

Data stored in an Excel worksheet can be easily converted into csv/ARFF

format. The following steps are to be followed to convert a data spreadsheet
into csv format:

1. Open the data spreadsheet in MS Excel.

2. Select File->Save as

3. In Sava as type tab Select CSV (Comma Delimited)(*.csv)

4. The file has been saved in CSV format. You can load it in WEKA. The
values in the first row will be taken as Attribute names and the subsequent
rows will be treated as data instances.

Figure 1. Iris, xis.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
456 K. P. S. ATTWAL and A. S. DHIMAN

Figure 2. Iris.csv.

2.2. Conversion from xls/xlsv to ARFF format

The following steps are to be followed to convert a data spreadsheet into

ARFF format:

1. Repeat the above four steps to convert the data spreadsheet into csv
format.

2. Open the above-created csv file in Notepad/Wordpad so that it can be

edited.

3. At the top of the file use @relation tag to define the name of the
dataset.

4. Delete the first row of the file which corresponds to attribute names.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  457

5. Use @attribute tag followed by attribute name and type as explained in

the previous section to define all the attributes of the dataset.

6. Insert @data tag one row above the data instances.

7. Save and close the file.

8. Rename the file and change its extension from csv to ARFF.

9. The file is ready to be loaded into WEKA.

Figure 3. Iris.arff.

3. The WEKA GUI

When WEKA is launched, the Weka GUI chooser appears on the screen.
The GUI Chooser includes of five buttons corresponding to the five major
Weka applications and four menus. A screen shot of Weka GUI chooser is
given in Figure 4. The buttons can be used to start the following applications:

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
458 K. P. S. ATTWAL and A. S. DHIMAN

Figure 4. WEKA GUI.

1. Explorer: The main access to WEKA’s GUI is obtained through its

Explorer. The researcher can use menu selection and form filling to access all
the facilities provided by the Explorer.

2. Experimenter: It provides a platform for experimentation. Analysis

can be made by applying multiple algorithms on the same dataset to check
which algorithm is performing better for a given dataset. An algorithm can be
evaluated on different datasets. So, the experimenter allows large scale
comparison of predictive performances of learning algorithms.

3. Knowledge Flow: An alternative option to the Explorer is the

Knowledge Flow interface. Users can lay out and connect widgets
representing WEKA components such as the data sources, data sinks,
evaluation, visualization, etc. It allows us to layout filters, classifiers,
evaluators interactively on a 2-D campus.

4. Workbench: It is a new feature that has been launched from Weka

3.8.0 (Bouckaert, et al., 2016). It is an all-in-one application which combines
all the other Weka GUI’s within the perspective of user-selection.

5. Simple CLI: It permits a direct execution of WEKA commands by

providing a simple command-line interface.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  459

3.1. Explorer

The researcher can use menu selection and form filling to access all the
facilities provided by the Explorer. The basic functions supported are given by
six tabs at the top of the Explorer. Initially, the user is in Preprocess tab.

Figure 5. The Weka Explorer.

3.1.1. Preprocessor

Of all the tabs only the first tab is active at the start of the explorer,
because dataset is required to be opened before the exploration of the data. At
the click of the button “Open file”, a standard dialog appears from which a file
can be selected, when the file is loaded, the screen shown in Figure appears
which displays information of the dataset. The dataset has five attributes and
one hundred and fifty instances (center left). The name of the attributes is
also given. At the click of the attributes, the information about that attribute
appears. In Figure, the class attribute is selected. In the selected attribute
section, the entire information about class attribute is shown on the right side
– the name of the attribute is class, data type is nominal, there are no
missing values, it has three distinct values and none of the instances has a
unique value. The count of different nominal values is also given. By default,
Weka picks the last attribute as the class attribute, though a different class
attribute can also be selected.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
460 K. P. S. ATTWAL and A. S. DHIMAN

An attribute can be deleted at the click of its checkbox and the “Remove”
button. “All” selects all the attributes, “None” selects none, the current
selection can be inverted using “Invert”, and “Pattern” selects the attributes
that have similarity with a user-supplied regular expression. The “Undo”
button helps in reverting the changes. The “Edit” button provides an editor
which allows the inspection of the data, search for particular values and their
editing, and in deleting the instances and the attributes. Corresponding
context menus appears at the right click on values and column headers
(Witten, Frank, & Hall, 2011).

Filters can be defined in preprocess section. This may transform the data
in various ways. The required filters are set up using the filter box. At its left
is a “Choose” button which helps in the selection of one of the filters in
WEKA. After this selection, the name of filter and its options appear in the
field next to the “Choose” button. A click on this box brings up a “Generic
Object Editor” dialog box (Bouckaert, et al., 2016).

Filters are of two types – Supervised and Unsupervised, each of which are
further of two types – Attribute and Instance. The Attribute filters can be
applied to the attributes of the dataset such as to add an attribute, remove an
attribute, etc. Unsupervised filters can be applied to the instances of the
dataset such as to randomize the instances or to remove instances.

3.1.2. Classifier

“Classification is the process of finding a model (or function) that

describes and distinguishes data classes or concepts, for the purpose of being
able to use the model to predict the class of objects whose class label is
unknown”(Han, Kamber, & Pei, 2011). Classification is a data mining task of
predicting the value of a categorical variable. This is done by building a model
based on one or more numerical and/or categorical variables. The
classification process uses a training set to train a model to predict unseen
data. In this study we train, evaluate, and apply a classifier to classify flowers
into their appropriate species. Error! Reference source not found. shows
that data from file Iris-train.arff is loaded and a learning algorithm such as
OneR, IBK (K Nearest Neighbours), J48 (C4.5), etc can be applied to this
training set to create a model. This model can be applied to predict the class
new data sets whose class attribute is unknown.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  461

Figure 6. Building a Classifier.

The classify section contains the “Classifier” box at its top. The classifier
box contains a text field which shows names of the classifier selected, and its
options. The options can be configured by clicking the text box and then the
dialog box. The result of applying the chosen classifier will be tested
according to the options that are set by clicking in the “Test options” box.
There are four test modes:

1. Use Training Set: In this case all the dataset is used to build a model
for the classifier. Based on the classifier, the classes of all the instances in the
dataset are evaluated again and the accuracy of the classifier is evaluated
based on how well it predicts the class of instances against their actual
classes.

2. Supplied Test Set: All the dataset is used to build a model for the
classifier. A separate test file is loaded and the evaluation of the classifier is
done on the basis of the authenticity of the prediction of the class of a set of
instances loaded from that file. The file to test on can be chosen by clicking
the set button.

3. Cross-validation: In 10-fold Cross – validation, the dataset is divided

into ten sets. In the first run, nine sets are used for training the classifier and
the tenth set is used for testing the classifier. In the next run, another set is
held out as the test data and the remaining nine sets are used as training
sets. Thus each set is held out in turn as the testing set and the process is
repeated ten times. So each data set used once for testing and nine times for
training. The results i.e. the accuracy and estimate of error are calculated as
an average of these ten runs. Weka then runs the algorithm for the eleventh
time to produce a classifier that can be used to predict classes. Stratified

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
462 K. P. S. ATTWAL and A. S. DHIMAN

cross‐validation ensures that each fold has the right proportion of each class
value. Weka uses Stratified cross‐validation by default.
4. Percentage Split: In Percentage split, the data set is divided into two
parts based on value entered in the % field. By default, the value in the %
field is 66. This means that 66% of the data is held out and the remaining
data is used as the test set. The classifier is evaluated on how well it predicts
the class of the test data.
The above four options are used only as a method of evaluation. The final
model is always built using all the training data.
Testing options may be set at the click of the “More options...” button
(Bouckaert, et al., 2016). Some of these options are – Output Model, Output
per-class stats and Output confusion matrix.
Once the classifier, test options and class have all been set, the learning
processis started at the click of the “Start” button. At the completion of the
training, the “Classifier output area” is filled with text which describes the
results of training and testing. It gives a new entry in the “Result list” box. At
the click of the left mouse button and Alt and Shift in the text area, a dialog
ox appears which saves the displayed output in different formats. The output
is split into several sections:
(1) Run information: This gives information about the algorithm (the
.class file executed), the relation name, the number of instances, the number
and name of the attributes and the test mode used.
(2) Classifier model (full training set): This is a textual
representation of the classification model which is produced using the full
training data.
(3) The results of the chosen test mode are thus divided into the following:
(i) Summary: This lists the statistics which summarises the ability of the
classifier to predict the true class of instances accurately under the chosen
test mode.
(ii) Detailed Accuracy by Class: This displays a detailed per-class
break down of the prediction accuracy of classifier.

(iii) Confusion Matrix: It tells about the number of instances assigned

to each class. Its elements depict the number of test examples. The rows of

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  463

the matrix show the actual class of the instances whereas the column shows
the predicted class. The diagonal elements show the number of correctly
classified instances.

(iv) Source code (optional): This section gives lists of the Java source
in the “More options” dialog for “Output source code”.

After the run is complete, a new entry appears in the “Result list” box.
After training several classifiers, several entries are included in the result
list. At the click on the entries, a user can flick back and forth between a
number of generated results. The selected entry from the results can be
deleted at the right click. Results can be viewed in a separate window and
can be saved. Similarly, the model can be saved, loaded and reevaluated on a
selected test set and perform a number of other tasks.

4. Using Filters to create your own Training set and Test set

It is always better to evaluate a classifier with a test set that is different

from the training set. If we use the same training set as a test set also, the
results can be misleading. So given there are enough instances, we can divide
our dataset into a training set and a test set. Both the training set and test
set should have a right proportion of each class value. We can use Weka
filters to divide a given data set into training and test sets using the following
steps:

1. Load the data set by pressing “Open” File button and choosing the
appropriate file. In this case, we will load Iris.arff file. The file has 150
instances of data.

2. Choose Filters->Unsupervised->Instance->Randomize in the “Filters”

tab in Preprocess panel. This filter shuffles the order of instances of the
dataset that are passed through it. You can set the properties of the filter by
clicking once on the selected filter. For example, in this case, you can choose a
different seed for randomization. Click “Apply” to apply this filter on the data
set. You can always revert the changes applied to a data set by pressing
“Undo” button at the top of Preprocess panel. The changes applied to the data
set can be seen in the “Edit” panel by pressing the “Edit” button which is
there at top along the “Undo” button. Edit panel can also be used to apply
changes to data items.
Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
464 K. P. S. ATTWAL and A. S. DHIMAN

3. Choose Filters->Unsupervised->Instance->RemovePercentage in the

filters tab in Preprocess panel. This filter removes a given percentage of a
dataset. By default, the percentage is set to 50. In this case set the percentage
to 20. Click “Apply” to apply this filter on the data set. You will see in the
Current relation information that now there are 120 instances of data left in
the data set. Save these instances as your training data set by pressing
“Save” button and saving the file as Iris-train.arff.

4. Load the original Iris.arff data set again. Choose Filters-

>Unsupervised->Instance->RemovePercentage in the filters tab in Preprocess
panel. Again set the percentage to 20 and set the “invertSelection” property to
“true”. Click “Apply” to apply this filter on the data set. You will see in the
Current relation information that now there are 30 instances of data left in
the data set. Save these instances as your test data set by pressing “Save”
button and saving the file as Iris-test.arff.

5. Using Test data set to Evaluate a Classifier

1. Load Iris-train.arff in the Preprocess panel. Move to “Classify” panel,

set the “Test options” to “Use training set” and build a classifer by pressing
“Start” button. If you go to the summary in “Classifier output”, you will see
that when this classifier is evaluated on the training set, the accuracy is
98.33%.

Figure 7. Classifier Output when used on Training Set.

2. Now set the “Test options” to “Supplied test set” and load Iris-test.arff
by clicking “Set…” button. Click on “More options” button in “Test options”
and in “Output predictions” choose “Plain text”. Right click on the classifier
built in last step in Result list and select “Re-evaluate model on current test

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  465

set” from the menu. Now you will see that percentage accuracy is 90%. So we
can clearly see that when the same training data set is used for evaluating a
classifier, the results can be misleading. The evaluation of the classifier on a
separate test set is always better. If you want to display the actual class and
the predicted class of the test data, then click on “More options” button in
“Test options” and in “Output predictions” choose “Plain text”. Right click on
the classifier in “Result list” and select “Re-evaluate model on current test”
set from the menu.

Figure 8. Classifier Output for Test Set.

Figure 9. Predicted class of Tuples in Test Set.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
466 K. P. S. ATTWAL and A. S. DHIMAN

6. Using Classifier to make Predictions on New Data

Once we have built a model, it can be used to classify new unseen data:

1. Create a file that contains new data i.e. the data for which class is
unknown. This file should have the structure similar to the file which is used
for learning the model. The only difference is that here we don’t know the
value of class variable. Replace the value of the class attribute with “?” for all
instances, question mark represents a missing value in Weka.

2. At the right click of menu on the “Result list” panel, the saved model is
loaded.

3. In the “Test Options”, select “Supplied test set”and load Iris-predict.arff

by clicking “Set…” button. Click on “More options” button in “Test options”
and in “Output predictions” choose “Plain text”. Right click on the classifier
loaded in step 2 in Result list and select “Re-evaluate model on current test
set” from the menu.

Figure 10. Iris-predict.arff.

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  467

Figure 11. Predicted class of Unknown tuples.

Here Error! Reference source not found.shows the file Iris-

predict.arff. The class values for these instances are unknown. So they are
marked as “?” (Missing values). Error! Reference source not found. shows
the result of “Predictions on user test set” in “Classifier output” panel.

7. Conclusion

The paper explores the data mining tool – WEKA. The different file
formats that can be used with WEKA tool are discussed. The default file
format of WEKA is ARFF. The paper shows how to store data in ARFF format
and how to convert data from other file formats into ARFF format. The paper
carries an in depth study of WEKA GUI. Classification task consists of
examining the features of a newly presented object and assigning a
predefined class to it. The study shows how to create and evaluate a classifier
using WEKA tool. The study has been carried out using IRIS dataset which is
provided along the WEKA tool. When the model is checked using the same
dataset that is used to build the model, the accuracy is found to be 98.33%.
This reduces to 90% when different dataset is used to check the accuracy of
the model. Therefore, the dataset is divided into the training dataset and the
test dataset; the first dataset is brought in use to create the model, and the
accuracy of the model is checked with the second dataset. The study also
shows how the model can be used for the prediction of the class of those
tuples/objects, the class label of which is not known.

References

[1] R. R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, et al. WEKA

Manual for Version 3-8-1. Hamilton, New Zealand: University of Waikato. (2016).

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
468 K. P. S. ATTWAL and A. S. DHIMAN

[2] N. L. Costa, L. A. Llobodanin, I. A. Castro and R. Barbosa, Using Support Vector

Machines and Neural Networks to Classify Merlot Wines from South America,
Information Processing in Agriculture (2019), 265-278.
[3] W. M. Dlamini, A Data Mining Approach to Predictive Vegetation Mapping Using
Probabilistic Graphical Models, Ecological Informatics 6 (2011), 111-124.
[4] M. H. Dunham, Data Mining Introductory and Advanced Topics, Pearson (2012).
[5] H. A. Edelstein, Introduction to Data Mining and Knowledge Discovery. Potomac, MD,
USA: Two Crows Corporation, (2005).
[6] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, From Data Mining To Knowledge
Discovery in Databases. AI Magazine 17(3) (1996), 37-54.
[7] R. Ghorbani and R. Ghousi, Predictive Data Mining Approaches in Medical Diagnosis: A
Review of Some Diseases Prediction. International Journal of Data and Network Science
(2019), 47-70.
[8] A. Haghverdi, B. Ghahraman, B. G. Leib, I. Pulido-Calvo, M. Kafi, K. Davary, et al.
Deriving Data Mining and Regression Based Water-salinity Production, Computers and
Electronics in Agriculture (2014), 68-75.
[9] M. F. Hall, The WEKA Data Mining Software: An Update, ACM SIGKDD explorations
newsletter 11(1) (2009, June), 10-18.
[10] J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, Waltham, MA,
USA: Morgan Kaufmann Publishers. (2011).
[11] F. Hoppner, Association Rules, In L. Roach and O. Maimon, Data Mining and Knowledge
Discovery Handbook (pp. 299-320). Springer.
[12] D. T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken,
New Jersey: John Wiley & Sons. (2014).
[13] O. Maimon and L. Rokach, Weka-A Machine Learning Workbench for Data Mining. In E.
Frank, M. Hall, G. Holmes, R. Kirkby, B. Pfahringe, I. H. Witte, et al., Data Mining and
Knowledge Discovery Handbook (pp. 1269-1277). London: Springer. (2010).
[14] T. M. Mitchell, Machine Learning. Chennai: McGraw Hill Education. (2018).
[15] A. Mucherino, P. J. Papaorgji and P. M. Pardalos, Data Mining in Agriculture. Springer.
(2009).
[16] D. L. Olson and L. G. Descriptive Data Mining, In Descriptive Data Mining (pp. 129-
130). Singapore: Springer. (2019).
[17] L. Rokach, A Survey of Clustering Algorithms, In L. Rokach and O. Maimon, Data
Mining and Knowledge Discovery Handbook (pp. 269-298). Springer. (2010).
[18] L. Rokach and O. Maimon, Classification Trees, In L. Rokach and O. Maimon, Data
Mining and Knowledge Discovery Handbook (pp. 148-174). Springer. (2010).
[19] L. Rokach and O. Maimon, Supervised Learning, In L. Rokach and O. Maimon, Data
Mining and Knowledge Discovery Handbook (pp. 133-147). Springer. (2010).
[20] L. J. Rutkowski, Stream Data Mining: Algorithms and their Probabilistic Properties.
Springer Nature. (2019).

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020
EXPLORING DATA MINING TOOL - WEKA AND USING WEKA  469

[21] P. Sebastiani, M. M. Abad and M. F. Ramoni, Bayesian Networks, In O. Maimon and L.

Rokach, Data Mining and Knowledge Discovery Handbook (pp. 175-208). Springer.
(2010).
[22] G. M. Weiss and B. D. Davison, Data Mining, In H. Bidgoli, Handbook of Technology
Management (pp. 542-555). Hoboken, NJ: John Wiley and Sons. (2010).
[23] I. H. Witten, E. Frank and M. A. Hall, Data Mining Practical Machine Learning Tools
and Techniques. Burlington, USA: Elsevier. (2011).

Advances and Applications in Mathematical Sciences, Volume 19, Issue 5, April 2020

Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Group 3: Elhaine, Jai, Icelle and Marianne
No ratings yet
Group 3: Elhaine, Jai, Icelle and Marianne
17 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Wekappt
No ratings yet
Wekappt
58 pages
Learning To Use We Ka
No ratings yet
Learning To Use We Ka
5 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
No ratings yet
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
41 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Lab 04
No ratings yet
Lab 04
7 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
Data Mining Complete Lab Manual - DRSNR
No ratings yet
Data Mining Complete Lab Manual - DRSNR
27 pages
Introduction To Weka: Xingquan (Hill) Zhu
No ratings yet
Introduction To Weka: Xingquan (Hill) Zhu
63 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
DWDM File
No ratings yet
DWDM File
26 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
5 MIS510 Weka NetDraw
No ratings yet
5 MIS510 Weka NetDraw
33 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
An Introduction To WEKA
No ratings yet
An Introduction To WEKA
85 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
DataMining Record Using Weka Tool
No ratings yet
DataMining Record Using Weka Tool
55 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
Java Data Structures Explained: A Practical Guide with Example
From Everand
Java Data Structures Explained: A Practical Guide with Example
William E. Clark
No ratings yet
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
JAX Essentials: The Complete Guide for Developers and Engineers
From Everand
JAX Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
DWDM Lab Manual r20
No ratings yet
DWDM Lab Manual r20
97 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Weka Lab Manual
No ratings yet
Weka Lab Manual
49 pages
DM Lab External Q.P Model
No ratings yet
DM Lab External Q.P Model
6 pages
NguyenCongSang ITITIU20292 Lab1
No ratings yet
NguyenCongSang ITITIU20292 Lab1
7 pages
COC257-Commercial Applications of Vehicle Image Classification
No ratings yet
COC257-Commercial Applications of Vehicle Image Classification
48 pages
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
No ratings yet
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
89 pages
CA - 605 - MJP Machine Learning Practical Slips
No ratings yet
CA - 605 - MJP Machine Learning Practical Slips
25 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
MTECH - FE - Mini Project-Concrete Compressive Strength Using Data Driven Techniques
No ratings yet
MTECH - FE - Mini Project-Concrete Compressive Strength Using Data Driven Techniques
55 pages
10 Best Data Mining Tools in 2022
No ratings yet
10 Best Data Mining Tools in 2022
8 pages
Vijay DMPM
No ratings yet
Vijay DMPM
23 pages
Musa IEEE
No ratings yet
Musa IEEE
6 pages
DM Lab Manualiii I 1 Mrits
No ratings yet
DM Lab Manualiii I 1 Mrits
39 pages
Data Mining and Business Analytics
No ratings yet
Data Mining and Business Analytics
7 pages
Experiment WEKA
No ratings yet
Experiment WEKA
16 pages
Lab 01-PhamBinhDuong ITCSIU21054
No ratings yet
Lab 01-PhamBinhDuong ITCSIU21054
9 pages
Exp 6
No ratings yet
Exp 6
12 pages
DWDM Lab Manual Final Updated New Finalll
100% (1)
DWDM Lab Manual Final Updated New Finalll
60 pages
ML Lab External QP
No ratings yet
ML Lab External QP
2 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
EAR4 Manual
No ratings yet
EAR4 Manual
7 pages
Data Warehousing Lab Record Final
No ratings yet
Data Warehousing Lab Record Final
45 pages
20ai402 Data Analytics Unit-1
No ratings yet
20ai402 Data Analytics Unit-1
52 pages
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages

131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal

Uploaded by

131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal

Uploaded by

Advances and Applications in Mathematical Sciences

Volume 19, Issue 6, April 2020, Pages 451-469

EXPLORING DATA MINING TOOL - WEKA AND USING

KANWAL PREET SINGH ATTWAL and AMARDEEP SINGH DHIMAN

Department of Computer Science and Engineering,

WEKA which stands for “Waikato Environment for Knowledge Analysis”

as its performance can be analysed. WEKA provides both – GUI and a

Table 1. Data Mining Tasks available in WEKA.

Data Mining Task Description Examples

Pre-Processing of Discretizing, Nominal to

Identify groups (i.e., clusters) of

Bayes Net, KNN,

Predicting numeric values for a

Association rule Finding relationships among values of Apriori Algorithm,

Visually represent data mining Cluster assignments,

The modular, object-oriented architectural design of WEKA workbench

All classifiers are included in subpackages of the top level “classifiers”

“Core” which are supporting classes in a top level package is a

When WEKA has been installed, weka-3-8 (or weka-version) folder is

2. Supported File Formats

where<relation-name> is a string. If the name contains spaces, the string

The format for the @attribute statement is:

Weka supports four types of data:

Numeric attributes may either be real or integer numbers.

Nominal attributes are used when the value of an attribute is chosen

For Iris dataset, the class value is defined as follows:

@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

Values that contain spaces must be quoted.

String is used to create attributes that contain arbitrary text values.

Date attribute is used to store dates in different formats. The declarations

@attribute <name> date [<date-format>]

Here<name> is the name for the attribute; <date-format> is an optional

2.1. Conversion from xls/xlsv to csv format

Data stored in an Excel worksheet can be easily converted into csv/ARFF

1. Open the data spreadsheet in MS Excel.

3. In Sava as type tab Select CSV (Comma Delimited)(*.csv)

Figure 1. Iris, xis.

2.2. Conversion from xls/xlsv to ARFF format

The following steps are to be followed to convert a data spreadsheet into

2. Open the above-created csv file in Notepad/Wordpad so that it can be

5. Use @attribute tag followed by attribute name and type as explained in

6. Insert @data tag one row above the data instances.

7. Save and close the file.

9. The file is ready to be loaded into WEKA.

3. The WEKA GUI

Figure 4. WEKA GUI.

1. Explorer: The main access to WEKA’s GUI is obtained through its

2. Experimenter: It provides a platform for experimentation. Analysis

3. Knowledge Flow: An alternative option to the Explorer is the

4. Workbench: It is a new feature that has been launched from Weka

5. Simple CLI: It permits a direct execution of WEKA commands by

Figure 5. The Weka Explorer.

“Classification is the process of finding a model (or function) that

Figure 6. Building a Classifier.

3. Cross-validation: In 10-fold Cross – validation, the dataset is divided

(iii) Confusion Matrix: It tells about the number of instances assigned

It is always better to evaluate a classifier with a test set that is different

2. Choose Filters->Unsupervised->Instance->Randomize in the “Filters”

3. Choose Filters->Unsupervised->Instance->RemovePercentage in the

4. Load the original Iris.arff data set again. Choose Filters-

5. Using Test data set to Evaluate a Classifier

1. Load Iris-train.arff in the Preprocess panel. Move to “Classify” panel,

Figure 7. Classifier Output when used on Training Set.

Figure 8. Classifier Output for Test Set.

Figure 9. Predicted class of Tuples in Test Set.

6. Using Classifier to make Predictions on New Data

3. In the “Test Options”, select “Supplied test set”and load Iris-predict.arff

Figure 10. Iris-predict.arff.

Figure 11. Predicted class of Unknown tuples.

Here Error! Reference source not found.shows the file Iris-

[1] R. R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, et al. WEKA

[2] N. L. Costa, L. A. Llobodanin, I. A. Castro and R. Barbosa, Using Support Vector

[21] P. Sebastiani, M. M. Abad and M. F. Ramoni, Bayesian Networks, In O. Maimon and L.

You might also like