0% found this document useful (0 votes)

120 views

Data Warehousing and Data Mining Lab

The document describes introducing the WEKA data mining tool. It discusses launching WEKA and exploring its two main interfaces: Explorer and Experimenter. Explorer provides tools for data exploration, classification, clustering, and more. Experimenter allows running experiments, comparing different learning algorithms, and analyzing results. Practical examples demonstrate loading data, preprocessing like attribute filtering and discretization, and running classification experiments in WEKA.

Uploaded by

Aman Jolly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views

Data Warehousing and Data Mining Lab

Uploaded by

Aman Jolly

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

DATA WAREHOUSING

AND DATA MINING LAB

SUBMITTED TO: MADE BY:

Hitesh Shivani Sazawal
Department of COE 2k14/CO/115
DTU B.Tech (COE)
DTU

1
PRACTICAL 1

Objective: Introduction about launching the WEKA (Waikato

Environment for Knowledge Analysis) tool and study WEKA
Explorer and WEKA Experimenter.

The WEKA GUI Chooser provides a starting point for launching WEKA’S main GUI
applications and supporting tools. The GUI Chooser consists of four buttons—one for each
of the four major WEKA applications—and four menus.

The buttons can be used to start the following applications:

Explorer: An environment for exploring data with WEKA.

Experimenter: An environment for performing experiments and conducting statistical tests

between learning schemes.

Knowledge Flow: It supports essentially the same functions as the explorer but with a drag
and drop interface. One advantage is that it supports incremental learning.

Simple CLI: Provides a simple command-line interface that allows direct execution of
WEKA commands for operating systems that do not provide their own command line
interface.

The menu consists of four sections like Program, Tools, Visualization, and Help.

EXPLORER:

It is a user interface which contains a group of tabs just below the title bar. The tabs are
as follows:
1. Preprocess
2. Classify

3. Cluster

4. Associate

2
5. Select Attributes

6. Visualize

3
EXPERIMENTER: The WEKA Experiment Environment enables the user to create,
run, modify, and analyses experiments in a more convenient manner. It can also be run from
the command line using the Simple CLI.

New Experiment:
After clicking on new default parameters for an Experiment are defined.
We can choose the experiment in two different modes 1) Simple and 2) Advanced

Result Destination:

By default, an ARFF file is the destination for the results output. But we can also choose
CSV file as the destination for output file. The advantage of ARFF or CSV files is that they
can be created without any additional classes. The drawback is the lack of ability to resume
the interrupted experiment.
Experiment type:
The user can choose between the following three different types:

Cross-validation: it is a default type and it performs stratified cross-validation with the given
number of folds.

● Train/Test Percentage Split: it splits a dataset according to the given percentage into a
train and a test file after the order of the data has been randomized and stratified.

● Train/Test Percentage Split: As it is impossible to specify an explicit train/test files

pair, one can abuse this type to un-merge previously merged train and test file into the
two original files.

● Additionally, one can choose between Classification and Regression, depending on

the datasets and classifiers one uses.
Data Sets:

One can add dataset files either with an absolute path or with a relative path.
Iteration control:

4
● Number of repetitions: In order to get statistically meaningful results, the default
number of iterations is 10.

5
● Data sets first/Algorithms first: As soon as one has more than one dataset and
algorithm, it can be useful to switch from datasets being iterated over first to
algorithms.

Algorithms: New algorithms can be added via the “Add New” button. Opening this dialog
for the first time, ZeroR is presented.

By clicking on the Choose button one can choose another classifier which is as shown in the
below diagram:

The “Filter...” button enables us to highlight classifiers that can handle certain attributes and
class types. With “Remove Filter” button one can clear the classifiers that are highlighted
earlier.

6
7
With the Load options... and Save options... buttons one can load and save the setup of a
selected classifier from and to XML.

Running an Experiment:

To run the current experiment, click the Run tab at the top of the Experiment Environment
window. The current experiment performs 10 runs of 10-fold stratified cross-validation.

After clicking the Run tab, it shows a window with start button and stop button, by clicking
on start button we can run the experiment and by clicking on stop button we can run the
experiment.

8
9
PRACTICAL 2

Objective : To demonstrate the process of preprocessing in WEKA on dataset

student.arff

This experiment illustrates some of the basic data preprocessing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the student
data available in arff format.
Procedure:

Step1: Loading the data. We can load the dataset into WEKA by clicking on open button in
preprocessing interface and selecting the appropriate file.

Step2: Once the data is loaded, WEKA will recognize the attributes and during the scan of
the data WEKA will compute some basic strategies on each attribute. The left panel in the
above figure shows the list of recognized attributes while the top panel indicates the names of
the base relation or table and the current working relation (which are same initially).
Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes
for the categorical attributes the frequency of each attribute value is shown, while for
continuous attributes we can obtain min, max, mean, standard deviation and deviation etc.,

Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.

Note:we can select another attribute using the dropdown list.

Step5:Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by using the

attribute filters in WEKA.In the filter model panel,click on choose button,This will show a
popup window with a list of available filters.

Scroll down the list and select the “WEKA.filters.unsupervised.attribute.remove” filters.

Step 6:a)Next click the textbox immediately to the right of the choose button.In the resulting
dialog box enter the index of the attribute to be filtered out.

b)Make sure that invert selection option is set to false.The click OK now in the filter box.you
will see “Remove-R-7”.

c)Click the apply button to apply filter to this data.This will remove the attribute and create
new working relation.

10
d)Save the new working relation as an arff file by clicking save button on the
top(button)panel.(student.arff)

11
Discretization

1)Sometimes association rule mining can only be performed on categorical data.This requires
performing discretization on numeric or continuous attributes.In the following example let us
discretize age attribute.

Let us divide the values of age attribute into three bins(intervals).

First load the dataset into WEKA (student.arff)

Select the age attribute.

Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize.

To change the defaults for the filters, click on the box immediately to the right of the choose
button.

We enter the index for the attribute to be discretized. In this case the attribute is age. So we
must enter ‘1’ corresponding to the age attribute.

Enter ‘3’ as the number of bins. Leave the remaining field values as they are.

Click OK button.

Click apply in the filter panel. This will result in a new working relation with the selected
attribute partition into 3 bins.

Save the new working relation in a file called student-data-discretized.arff

Dataset student .arff

@relation student

@attribute age {<30,30-40,>40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

@data

12
13
<30, high, no, fair, no <30, high, no, excellent, no 30-40, high, no, fair, yes >40,

medium, no, fair, yes >40, low, yes, fair, yes >40, low, yes, excellent, no

30-40, low, yes, excellent, yes <30, medium, no, fair, no <30, low, yes, fair, no

>40, medium, yes, fair, yes <30, medium, yes, excellent, yes 30-40, medium, no, excellent,

yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

14
The following screenshot shows the effect of discretization.

15
PRACTICAL 3

Objective : To perform preprocessing on dataset labor.arff

This experiment illustrates some of the basic data preprocessing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the labor data
available in arff format.
Procedure:

Step1: Loading the data. We can load the dataset into WEKA by clicking on open button in
preprocessing interface and selecting the appropriate file.

Step3: Clicking on an attribute in the left panel will show the basic statistics on the attributes
for the categorical attributes the frequency of each attribute value is shown, while for
continuous attributes we can obtain min, max, mean, standard deviation and deviation etc.,

Step4: he visualization in the right button panel in the form of cross-tabulation across two
attributes.

Note:we can select another attribute using the dropdown list.

Step5: Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by using the

attribute filters in WEKA.In the filter model panel,click on choose button,This will show a
popup window with a list of available filters.

Scroll down the list and select the “WEKA.filters.unsupervised.attribute.remove” filters.

Step 6:a)Next click the textbox immediately to the right of the choose button.In the resulting
dialog box enter the index of the attribute to be filtered out.

b)Make sure that invert selection option is set to false.The click OK now in the filter box.you
will see “Remove-R-7”.

c)Click the apply button to apply filter to this data.This will remove the attribute and create
new working relation.

d)Save the new working relation as an arff file by clicking save button on the
top(button)panel.(labor.arff)

16
17
Discretization

Let us divide the values of duration attribute into three bins(intervals).

First load the dataset into WEKA(labor.arff)

Select the duration attribute.

Activate filter-dialog box and select” “WEKA.filters.unsupervised.attribute.discretize from

the list.

To change the defaults for the filters, click on the box immediately to the right of thechoose
button.

We enter the index for the attribute to be discretized.In this case the attribute is duration So
we must enter ‘1’ corresponding to the duration attribute.

Enter ‘1’ as the number of bins. Leave the remaining field values as they are.

Click OK button.

Click apply in the filter panel.This will result in a new working relation with the selected
attribute partition into 1 bin.

Save the new working relation in a file called labor-data-discretized.arff

Dataset : labor.arff

18
19
The following screenshot shows the effect of discretization

20
PRACTICAL 4

Objective: To Perform Association Rule Mining on dataset

contactlenses.arff using Apriori algorithm

This experiment illustrates some of the basic elements of asscociation rule mining using
WEKA. The sample dataset used for this example is contactlenses.arff
Procedure:

Step1: Open the data file in WEKA Explorer. It is presumed that the required data fields have
been discretized. In this example it is age attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use Apriori algorithm. This is the default algorithm.

Step4: In order to change the parameters for the run (example support, confidence etc) we
click on the text box immediately to the right of the choose button.

Dataset contactlenses.arff

21
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.

22
23
PRACTICAL 5

Objective : To Perform Association Rule Mining on dataset test.arff

using Apriori algorithm

This experiment illustrates some of the basic elements of asscociation rule mining using
WEKA. The sample dataset used for this example is test.arff.
Procedure:

Step1: Open the data file in WEKA Explorer. It is presumed that the required data fields have
been discretized. In this example it is age attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use Apriori algorithm. This is the default algorithm.

Step4: In order to change the parameters for the run (example support, confidence etc) we
click on the text box immediately to the right of the choose button.

Dataset test.arff

@relation test

@attribute admissionyear {2005,2006,2007,2008,2009,2010}

@attribute course {cse,mech,it,ece}

@data

2005, cse

2005, it

2005, cse

2006, mech

2006, it

2006, ece

2007, it

2007, cse

2008, it

2008, cse

24
2009, it

2009, ece

25
The following screenshot shows the association rules that were generated when apriori
algorithm is applied on the given dataset.

26
27
PRATICAL 6
Objective: To perform classification rule process on dataset student.arff
using j48 algorithm.

This experiment illustrates the use of j-48 classifier in WEKA. The sample data set used in
this experiment is “student” data available at arff format. This document assumes that
appropriate data pre-processing has been performed.
Procedure:

Step-1: We begin the experiment by loading the data (student.arff)into WEKA.

Step2:Next we select the “classify” tab and click “choose” button to select the “j48”classifier.

Step3: Now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values. The
default version does perform some pruning but does not perform error pruning.

Step4: Under the “text” options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is necessary to
get a reasonable idea of accuracy of generated model.

Step-5: We now click ”start” to generate the model the ACSII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.

Step-6: Note that the classification accuracy of model is about 69%.this indicates that we may
find more work.(Either in preprocessing or in selecting current parameters for the
classification)

Step-7: Now WEKA also lets us a view a graphical version of the classification tree. This can
be done by right clicking the last result set and selecting “visualize tree” from the pop-up
menu.

Step-8: We will use our model to classify the new instances.

Step-9: In the main panel under “text” options click the “supplied test set” radio button and
then click the “set” button. This wills pop-up a window which will allow you to open the file
containing test instances.

28
Dataset student .arff

@relation student

@attribute age {<30,30-40,>40} @attribute income {low, medium, high} @attribute student

{yes, no}

@attribute credit-rating {fair, excellent} @attribute buyspc {yes, no}

@data

<30, high, no, fair, no <30, high, no, excellent, no 30-40, high, no, fair, yes >40,

medium, no, fair, yes >40, low, yes, fair, yes >40, low, yes, excellent, no

30-40, low, yes, excellent, yes <30, medium, no, fair, no <30, low, yes, fair, no

>40, medium, yes, fair, yes <30, medium, yes, excellent, yes 30-40, medium, no,

excellent, yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

29
The following screenshot shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

30
31
32
PRACTICAL 7
Objective: Demonstration of classification rule process on dataset
employee.arff using j48 algorithm

This experiment illustrates the use of j-48 classifier in WEKA.the sample data set used in this
experiment is “employee” data available in arff format. This document assumes that
appropriate data pre-processing has been performed.
Prodedure:
Step 1: We begin the experiment by loading the data (employee.arff) into WEKA.

Step2: Next we select the “classify” tab and click “choose” button to select the
“j48”classifier.

Step3: Now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values the default
version does perform some pruning but does not perform error pruning.

Step4: Under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is necessary to
get a reasonable idea of accuracy of generated model.

Step-5: We now click ”start” to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.

Step-6: Note that the classification accuracy of model is about 69%.this indicates that we may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)

Step-7: Now WEKA also lets us a view a graphical version of the classification tree. This can
be done by right clicking the last result set and selecting “visualize tree” from the pop-up
menu.

Step-8: We will use our model to classify the new instances.

Step-9: In the main panel under “text “options click the “supplied test set” radio button and
then click the “set” button. This wills pop-up a window which will allow you to open the file
containing test instances.

33
Dataset employee.arff:

@relation employee

@attribute age {25, 27, 28, 29, 30, 35, 48} @attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}

@attribute performance {good, avg, poor}

@data

25, 10k, poor 27, 15k, poor 27, 17k, poor 28, 17k, poor

29, 20k, avg 30, 25k, avg 29, 25k, avg 30, 20k, avg 35,

32k, good 48, 34k, good 48, 32k,good

34
The following screenshot shows the classification rules that were generated whenj48

algorithm is applied on the given dataset.

35
36
37
PRACTICAL 8
Objective:Demonstration of classification rule process on dataset
employee.arff using id3 algorithm

This experiment illustrates the use of id3 classifier in WEKA. The sample data set used in
this experiment is “employee”data available at arff format. This document assumes that
appropriate data pre-processing has been performed.

Procedure:

Step1: We begin the experiment by loading the data (employee.arff) into WEKA.

Step2: We select the “classify” tab and click “choose” button to select the “id3”classifier.

Step3: now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values his default
version does perform some pruning but does not perform error pruning.

Step4: Under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is necessary
to get a reasonable idea of accuracy of generated model.

Step-5: We now click start to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.

Step-6: Note that the classification accuracy of model is about 69%.this indicates that we may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)

Step-7: Now WEKA also lets us a view a graphical version of the classification tree. This can
be done by right clicking the last result set and selecting “visualize tree” from the pop-up
menu.

Step-8: We will use our model to classify the new instances.

Step-9: In the main panel under “text “options click the “supplied test set” radio button and
then click the “set” button. This will show pop-up window which will allow you to open the
file containing test instances.

38
Data set employee.arff:

@relation employee

@attribute age {25, 27, 28, 29, 30, 35, 48} @attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}

@attribute performance {good, avg, poor}

@data

25, 10k, poor 27, 15k, poor 27, 17k, poor 28, 17k, poor

29, 20k, avg 30, 25k, avg 29, 25k, avg 30, 20k, avg 35,

32k, good 48, 34k, good 48, 32k, good

39
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.

40
41
PRACTICAL 9

Objective : Demonstration of classification rule process on dataset

employee.arff using naïve bayes algorithm

This experiment illustrates the use of naïve bayes classifier in WEKA. The sample data set
used in this experiment is “employee”data available in arff format. This document assumes
that appropriate data preprocessing has been performed.

Procedure:
1. We begin the experiment by loading the data (employee.arff) into WEKA.

Step2: Next we select the “classify” tab and click “choose” button to select the
“id3”classifier.

Step3: Now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values his default
version does perform some pruning but does not perform error pruning.

Step4: Under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is necessary
to get a reasonable idea of accuracy of generated model.

Step-5: We now click start to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.

Step-6: Note that the classification accuracy of model is about 69%.this indicates that we may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)

Step-7: Now WEKA also lets us a view a graphical version of the classification tree. This can
be done by right clicking the last result set and selecting “visualize tree” from the pop-up
menu.

Step-8: We will use our model to classify the new instances.

42
Data set employee.arff:

@relation employee

@attribute age {25, 27, 28, 29, 30, 35, 48} @attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}

@attribute performance {good, avg, poor}

@data

25, 10k, poor 27, 15k, poor 27, 17k, poor 28, 17k, poor

29, 20k, avg 30, 25k, avg 29, 25k, avg 30, 20k, avg 35,

32k, good 48, 34k, good 48, 32k, good

43
The following screenshot shows the classification rules that were generated when naive bayes
algorithm is applied on the given dataset.

44
45
PRACTICAL 10

Objective: To perform clustering on dataset iris.arff using simple k-means

This experiment illustrates the use of simple k-mean clustering with WEKA explorer. The
sample data set used for this example is based on the iris data available in ARFF format. This
document assumes that appropriate preprocessing has been performed. This iris dataset
includes 150 instances.
Procedure:

Step 1: Run the WEKA explorer and load the data file iris.arff in preprocessing interface.

Step 2: In order to perform clustering select the ‘cluster’ tab in the explorer and click on the
choose button. This step results in a dropdown list of available clustering algorithms.

Step 3 : In this case we select ‘simple k-means’.

Step 4: Next click in text button to the right of the choose button to get popup window shown
in the screenshots. In this window we enter six on the number of clusters and we leave the
value of the seed on as it is. The seed value is used in generating a random number which is
used for making the internal assignments of instances of clusters.

Step 5 : Once of the option have been specified. We run the clustering algorithm there we
must make sure that they are in the ‘cluster mode’ panel. The use of training set option is
selected and then we click ‘start’ button. This process and resulting window are shown in the
following screenshots.

Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid are
means vectors for each clusters. This clusters can be used to characterized the cluster.For eg,
the centroid of cluster1 shows the class iris.versicolor mean value of the sepal length is
5.4706, sepal width 2.4765, petal width 1.1294, petal length 3.7941.

Step 7: Another way of understanding characterstics of each cluster through visualization ,we
can do this, try right clicking the result set on the result. List panel and selecting the visualize
cluster assignments.

46
The following screenshot shows the clustering rules that were generated when simple k
means algorithm is applied on the given dataset.

47
From the above visualization, we can understand the distribution of sepal length and petal
length in each cluster. For instance, for each cluster is dominated by petal length. In this case
by changing the color dimension to other attributes we can see their distribution with in each
of the cluster.

Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result iris k-mean .The top portion of this file is shown in the following figure.

48
PRACTICAL 11
Objective: To perfrom clustering process on dataset student.arff using
simple k-means

This experiment illustrates the use of simple k-mean clustering with WEKA explorer. The
sample data set used for this example is based on the student data available in ARFF format.
This document assumes that appropriate preprocessing has been performed. This istudent
dataset includes 14 instances.

Procedure:
Step 1: Run the WEKA explorer and load the data file student.arff in preprocessing interface.

Step 2: In order to perform clustering select the ‘cluster’ tab in the explorer and click on the
choose button. This step results in a dropdown list of available clustering algorithms.

Step 3 : In this case we select ‘simple k-means’.

Step 7: Another way of understanding characteristics of each cluster through visualization,

we can do this, try right clicking the result set on the result. List panel and selecting the
visualize cluster assignments.

Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result student k-mean .The top portion of this file is shown in the following figure.

49
Dataset student .arff

@relation student

@attribute age {<30,30-40,>40} @attribute income {low,medium,high} @attribute student

{yes,no}

@attribute credit-rating {fair,excellent} @attribute buyspc {yes,no}

@data

<30, high, no, fair, no <30, high, no, excellent, no 30-40, high, no, fair, yes >40,

medium, no, fair, yes >40, low, yes, fair, yes >40, low, yes, excellent, no

30-40, low, yes, excellent, yes <30, medium, no, fair, no <30, low, yes, fair, no

>40, medium, yes, fair, yes <30, medium, yes, excellent, yes 30-40, medium, no,

excellent, yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

50
The following screenshot shows the clustering rules that were generated when simple k-

means algorithm is applied on the given dataset.

51
52
53

Azure Exam Latest 104
0% (3)
Azure Exam Latest 104
13 pages
Expt 1 Docx
No ratings yet
Expt 1 Docx
15 pages
DM Lab
No ratings yet
DM Lab
101 pages
DMW lab Print
No ratings yet
DMW lab Print
21 pages
DM Lab Task-1 Expr's-1
No ratings yet
DM Lab Task-1 Expr's-1
58 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
No ratings yet
Data Mining Lab Manual: Aurora's PG College Moosarambagh Mca Department
42 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
DWDM Lab Manual Using Weka-For MIC
No ratings yet
DWDM Lab Manual Using Weka-For MIC
42 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
DWDM LAB MANUAL
No ratings yet
DWDM LAB MANUAL
55 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
MC0717 Lab Manual
No ratings yet
MC0717 Lab Manual
42 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
DWDM Record With Alignment
No ratings yet
DWDM Record With Alignment
69 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
dwdm_file-final_ver3.pdf_20241230_172003_0000
No ratings yet
dwdm_file-final_ver3.pdf_20241230_172003_0000
54 pages
DM Tools Sample-1
No ratings yet
DM Tools Sample-1
72 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
40 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
BI_Experiment _No_1
No ratings yet
BI_Experiment _No_1
7 pages
Task 0: Weka Introduction
No ratings yet
Task 0: Weka Introduction
11 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
DWDM - Case Study On Weka - Ceb624
No ratings yet
DWDM - Case Study On Weka - Ceb624
13 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
hw2 Datapreproc
No ratings yet
hw2 Datapreproc
15 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
DWM1
No ratings yet
DWM1
19 pages
aiml manual
No ratings yet
aiml manual
27 pages
DATA WAREHOUSING -TO WRITE
No ratings yet
DATA WAREHOUSING -TO WRITE
23 pages
DMW Lab Manual
No ratings yet
DMW Lab Manual
42 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
100% (1)
Demonstration of Preprocessing On Dataset Student - Arff Aim: This Experiment Illustrates Some of The Basic Data Preprocessing Operations That Can Be
4 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
dw9exp1(1)
No ratings yet
dw9exp1(1)
43 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
data mining and warehousing
No ratings yet
data mining and warehousing
30 pages
Journal Data Mining
No ratings yet
Journal Data Mining
31 pages
Lab04
No ratings yet
Lab04
7 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
From Everand
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
Dwayne Phillips
No ratings yet
PhonePe Statement Mar2024 Mar2025
No ratings yet
PhonePe Statement Mar2024 Mar2025
198 pages
2 Sim Hydraulics Reference
100% (1)
2 Sim Hydraulics Reference
430 pages
DS-140 DESIGN SPECIFICATION Payroll Reports Version 1.0
No ratings yet
DS-140 DESIGN SPECIFICATION Payroll Reports Version 1.0
78 pages
Updated_Hotel_Cancellation_Project_Report
No ratings yet
Updated_Hotel_Cancellation_Project_Report
25 pages
ACN Lab Manual (1) (1) - 4-70
No ratings yet
ACN Lab Manual (1) (1) - 4-70
67 pages
Stable and Effective Use Cyberspace
No ratings yet
Stable and Effective Use Cyberspace
13 pages
SQL Assignment 3
No ratings yet
SQL Assignment 3
26 pages
Prob AND STochastic Process
No ratings yet
Prob AND STochastic Process
300 pages
FLM2-PRT-0006 FilmArray2.0 Instrument Quick Guide en
No ratings yet
FLM2-PRT-0006 FilmArray2.0 Instrument Quick Guide en
2 pages
Types of Computer Viruses
No ratings yet
Types of Computer Viruses
14 pages
Robbery On DevOps Understanding and Mitigating Illicit Cryptomining On Continuous Integration Service Platforms2022
No ratings yet
Robbery On DevOps Understanding and Mitigating Illicit Cryptomining On Continuous Integration Service Platforms2022
16 pages
Project Report-Final Yr (6months) - Eshan
25% (4)
Project Report-Final Yr (6months) - Eshan
52 pages
Sample - Powerpoint 2016 Advanced Training Manual Usa
No ratings yet
Sample - Powerpoint 2016 Advanced Training Manual Usa
194 pages
Civil Internship Project
No ratings yet
Civil Internship Project
39 pages
20 Minute - Test No.1 - KEY
No ratings yet
20 Minute - Test No.1 - KEY
2 pages
5 Steps For Planning Surveys: Tip Sheet
No ratings yet
5 Steps For Planning Surveys: Tip Sheet
5 pages
Pro E Config Options
No ratings yet
Pro E Config Options
52 pages
Salesforce Summer23 Release Notes
No ratings yet
Salesforce Summer23 Release Notes
729 pages
Iris Script
No ratings yet
Iris Script
366 pages
DP2
No ratings yet
DP2
8 pages
Gambar Sebelum Running
No ratings yet
Gambar Sebelum Running
1 page
Basic Bussiness English Writing
No ratings yet
Basic Bussiness English Writing
20 pages
Dynamic Modification of The Inspection Scope: Release 4.6C
No ratings yet
Dynamic Modification of The Inspection Scope: Release 4.6C
38 pages
Clinical Chemistry Autoanalyzers Diconex Brand InCCA Models - 2019
No ratings yet
Clinical Chemistry Autoanalyzers Diconex Brand InCCA Models - 2019
21 pages
How Does It Work?
No ratings yet
How Does It Work?
9 pages
MESI Protocol
No ratings yet
MESI Protocol
9 pages
Resolver Case 7241101: Complaint Against Lenskart by Anuj Kumar Generated 16/05/21 at 16:05 GMT
No ratings yet
Resolver Case 7241101: Complaint Against Lenskart by Anuj Kumar Generated 16/05/21 at 16:05 GMT
23 pages
First Grade Lesson Plan - CTM
No ratings yet
First Grade Lesson Plan - CTM
3 pages
Standard VHDL Packages
No ratings yet
Standard VHDL Packages
2 pages

Data Warehousing and Data Mining Lab

Uploaded by

Data Warehousing and Data Mining Lab

Uploaded by

DATA WAREHOUSING

AND DATA MINING LAB

SUBMITTED TO: MADE BY:

Objective: Introduction about launching the WEKA (Waikato

The buttons can be used to start the following applications:

Explorer: An environment for exploring data with WEKA.

Experimenter: An environment for performing experiments and conducting statistical tests

● Train/Test Percentage Split: As it is impossible to specify an explicit train/test files

● Additionally, one can choose between Classification and Regression, depending on

Objective : To demonstrate the process of preprocessing in WEKA on dataset

Note:we can select another attribute using the dropdown list.

Step5:Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by using the

Scroll down the list and select the “WEKA.filters.unsupervised.attribute.remove” filters.

Let us divide the values of age attribute into three bins(intervals).

First load the dataset into WEKA (student.arff)

Select the age attribute.

Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize.

Save the new working relation in a file called student-data-discretized.arff

Dataset student .arff

@attribute age {<30,30-40,>40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

Objective : To perform preprocessing on dataset labor.arff

Note:we can select another attribute using the dropdown list.

Step5: Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by using the

Scroll down the list and select the “WEKA.filters.unsupervised.attribute.remove” filters.

Let us divide the values of duration attribute into three bins(intervals).

First load the dataset into WEKA(labor.arff)

Select the duration attribute.

Activate filter-dialog box and select” “WEKA.filters.unsupervised.attribute.discretize from

Save the new working relation in a file called labor-data-discretized.arff

Objective: To Perform Association Rule Mining on dataset

Step3: We will use Apriori algorithm. This is the default algorithm.

Objective : To Perform Association Rule Mining on dataset test.arff

Step3: We will use Apriori algorithm. This is the default algorithm.

@attribute admissionyear {2005,2006,2007,2008,2009,2010}

@attribute course {cse,mech,it,ece}

Step-1: We begin the experiment by loading the data (student.arff)into WEKA.

Step-8: We will use our model to classify the new instances.

@attribute credit-rating {fair, excellent} @attribute buyspc {yes, no}

excellent, yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

Step-8: We will use our model to classify the new instances.

@attribute performance {good, avg, poor}

32k, good 48, 34k, good 48, 32k,good

algorithm is applied on the given dataset.

Step-8: We will use our model to classify the new instances.

@attribute performance {good, avg, poor}

32k, good 48, 34k, good 48, 32k, good

Objective : Demonstration of classification rule process on dataset

Step-8: We will use our model to classify the new instances.

@attribute performance {good, avg, poor}

32k, good 48, 34k, good 48, 32k, good

Objective: To perform clustering on dataset iris.arff using simple k-means

Step 3 : In this case we select ‘simple k-means’.

Step 3 : In this case we select ‘simple k-means’.

Step 7: Another way of understanding characteristics of each cluster through visualization,

@attribute age {<30,30-40,>40} @attribute income {low,medium,high} @attribute student

@attribute credit-rating {fair,excellent} @attribute buyspc {yes,no}

excellent, yes 30-40, high, yes, fair, yes

>40, medium, no, excellent, no

means algorithm is applied on the given dataset.

You might also like