Experiment : Installation of WEKA Tool
Aim: A. Investigation the Application
interfaces of the Weka tool. Introduction:
Introduction
Weka (pronounced to rhyme with Mecca) is a workbench
visualization tools and algorithms for data analysis and that contains a collection of
graphical user interfaces for easy access to these predictive modeling, together with
Weka was a Tc/Tk front-end to (mostly functions. The original non-Java version of
third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing
running machine learning experiments. This utilities in C, and Make file-based system for
original
analyzing data from agricultural domains, but the moreversion was primarily designed as a tool for
recent fully
for which development
started in 1997, is now used in many Java-based version (Weka 3),
particular for educational purposes and research. different application areas, in
Advantages Weka include:
of
Free availability under the GNU
General
Portability, since it is fully implemented Public License.
in the Java
on almost any modern
computing programming language and thus runs
Acomprehensive collection of platform
data
Ease of use due to its graphical user preprocessing and modeling techniques
interfaces
Description:
Open the program. Once the program has
been loaded on the user's machine it is
navigating the programs start option and that will depend on
to opened by
the user"s operating system.
Figure 1.1 is an example of the initial opening screen on a
There are four options available on this initial
computer.
screen:
Weka GUIChooser
Program Visualization Tools Help
Applications
WEKA Explorer
The University Experimenter
of Waikalo
KnowiedgeFlow
Waikato Environment tor Knowledge Analysis
Vesion 38.0 Workbench
tc) 1999-2016
The University ofWalkate
Simple CLI
Hamilton, New Zesland
Fig: 1.1 Weka GUI
1.
Explorer -the
the Explorer graphical interface used to conduct experimentation on raw data After clicking
button the weka explorer
interface appears.
PrepocessCSS
Open URI
Füler
Open D8.. Generate
[Link]
Currenirelation
Relalon None Selected attrilbute
tstancS None Aiributes: None Name. None
Sum ofwelghts: None Type None
An tes Missing: None Disinct None Unlque None
Msualze All
Status
Welcoome to the weka Eploret
Log i0
Fig: 1.2 Pre-processor
Information Technology Page 2
Weka Explorer
Anrbate Evatuntor
Chooe
Soarch Metho wela otbutb
Selecsba ClsSubselEvd
Abou
Chooss
Cls SubseiEval
Atribute Selec
Evaluates he worth o a subsel t atibutes tby consldering the Capablitos
Indiaual predcwe abUY ot ean ealura along nthe degree
of redundana beween
them
debug Falsa
Num) rnd
coNotCheckCapabiues False
Stait
localyPredidive Truo
Result bst (righ
missingSepsrsleFalse
numThreads 1
pooiSize
Status
Open OR Cancel Log
S3ve
Weka Explores
/ PreprocessClassty Cluster Associate Selectatrlbutes Visualze
Attribute Evraluator
Choose CtsSutbsetEval -P1-E1
Search ethod
X
Choose
gUcenencObjetEdto
weka atributeSeledhon. BestFirst
Attibute Selec
About
More
W
Cross2 ÍBestFirst
by greedy hillcdimbing
Search6s the spaca of attrlbutesubsets
augmentedwith a bactracxing laclty
(Num) treng
drecionForward
Stat
loouDCacheS2e 1
Result tist (righ
searchTeminatlon 5
statSet
OK Cancef
A
Ooen. Lo
Status
OK
Page 3
Information Technology
lnside the weka explorer window there are
six tabs:
Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate afile or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user
2
Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation
Fig: 1.3 choosing Zero set from classify
Again there are several options to be selected inside of the
classify tab. Test option gives the user
the choice of using four different test mode
scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage
3 Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.
Page 4
Information Technology
Preproces Clasoty lCluster AssGala
Qusterr seled atnodes Vsu
Choose EM100 -4-1-y
10-rnay- 1 L 1 0E.6-i Ner 1 0E 6M 10F.6
K10-umn dote 1 100
Cluster node
Clusterer output
Uwetraning sei
WSupped tesleet
W Percentage api
Clas$es lo duslers evaluaton
Btore dusters for
nsualcatlon
lgnore atnbules
Sart
Result kst (cight-chck lor opuons)
Stitus
OK
Log
4 Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for
associations within the dataset.
PreprocessClassity Ciu_ter LAssodate Seled attrlbutes ISuallze
ASSOcator
Choose Apriori -N 10-T0-C 0.9-D 0.05-U1.0 -M01-8-1,0 -ç -1
Stant. Associator output
Resut list (right..
Status
OK Log
5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the
6
experiment
Visualize- used to see what the various manipulation produced on the data set in a 2D format.
inscatter plot and bar
graph output.
Experimenter -this option allows USers to conduct diferent experimental variations on data
sets and pertom statistical manipulation. The Weka Experiment Environment enables the user to
Create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
Several schemes against a series of datasets and then analyze the results to detemine if one of the
schemes is (statistically) better than the other schemes.
nment
Setup AonAnalys
Experlment Connguratlon Made Simple
Open New
Resutts Destnation
Experiment Type Meraüon Control
Number of repetitons:
Dasttion
Algorfthms
Datasets
Cte'eic.
Use reistive paths
Fig: 1.6 Weka experiment
database.
file, CSV file, JDBC
Results destination: ARFF Percentage Split (data
randomized).
Cross-validation (default),Train/Test
Experiment type: first/Algorithms first.
Number of repetitions, Data sets
Iteration control:
Algorithmns: filters
Page 6
Information Technology
drop
3. Knowledge Flow -basically the same functionality as Explorer with drag and
previous
functionality. The advantage of this option is that it supports incremental learning from
results
ability to execute
4. Simple CLI - provides users without a graphic interface option the
commands from a terminal window.
b. Explore the default datasets in weka tool.
directory.
double click on the data"
Click the "Open button to open a data set and
file... " to practíceon.
learning datasetsthat you can use
Weka provides a number of small common machine
Select the "[Link] file to load the Iris dataset.
Seach dota
Share
Program Files Weka-3-8 data
". 7 This PC OS iC)
Dste mad1fiet Trpe
Name AREF Dsa File
Favorites ARFF Dia hie
[Link] 4/14/201623AM
Destop ARFF Dta File
Downloads O
breat-oncerartf 4/t4/2616 1.23 44
ARFF Czta File
contact-lenses,arff 4/|4/2016 823 AM
Recent places AREF Dets Fle
[Link] 4 142016 2:2 A
a OneDrve ARSF Dsts File
[Link] 4/14/2015 A
ARFF Cato File
Ocredit-garff 4/14r2016 &2÷ AM
Homegroup O [Link] 4/14/2016 2za AM ARFF Dat Fle
ARFF DIa Fúe
[Link] 4/14'201633At
Ths PC [Link] 4/14/016 3:23 AtM
ARFF Dtta Fiie
4 K8
Desktop O [Link] S1U0163.23 AM
ARFF Dzta File
Documets ARFF Det* Fle
O [Link] 4/14716825 AM
Downloads ARFF Ots Fúe
Music 42423162.2 A!
LPEr Dats File
O aborarff APFF Bats File
Prctures OReutersCorm-testatf 4/14/2D16 &22 AIM
Videos O ReutesCom-troin,arff 4142016 223 44
ARSF D) File
[Link] ARFF sta fiie
4/14/20163 22 A64
A Hew Voiume (F:) ARFF Dgta Fife 136 KB
OReutersGrain-tràinaff 4714/2016 323 A6N
Rs Nen Volume (G:) 108 KB
segment-challengeartf 4/14/2616 33AA ARFF Da File
[Link] 4/144206 8:2 AM ARFF Ozta Fie
Network [Link] 4/14/20t6 2 s AM AR$F Ota File
SAKRISHNAN [Link]
1itenn selected 587 bytes
Sets in weka
25 terns
Fig: 1.7 Different Data
and
References: (2005) Data Mining: Practical machine learning tools
E.
[1]Witten, LH. and Frank, Kaufmann, San Francisco. Publishers,
Morgan
techniques. 2nd edition Learning, Morgan Kaufmann
C4.5: Programs for Machine
[2] Ross Quinlan (1993).
San Mateo, CA.
(3] CVS-htp://[Link]/[Link]/CVS
[4]Weka Doc-[Link]
Exercise: normalization
min-max
1. Normalize the data using
Page 7
Information Technology