0% found this document useful (0 votes)

125 views

JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On

This document presents a mini project report on iris flower classification using the WEKA data mining tool. It was submitted by three students and guided by Prof. Nilufar Zaman. The report includes an introduction, objectives and scope, proposed methodology using WEKA, results and discussion, advantages and disadvantages, and conclusion. The methodology section describes preprocessing the iris dataset in ARFF format, selecting a classifier like J48, evaluating performance on test data, and finding performance criteria. The objectives are to classify iris flowers and analyze the dataset using data mining techniques to generate insights about iris types.

Uploaded by

kirti reddy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views

JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On

Uploaded by

kirti reddy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

JSPM’S

Bhivarabai Sawant Institute of Technology & Research

Pune-412207

Department Of Computer Engineering

Academic Year 2019-20

Mini Project Report

Submitted by:
Charul Joshi(BEA_40)
Kirti Reddy(BEA_39)
Danesh Bastani(BEA_48)

Under the guidance of Prof.

Prof. Nilufar Zaman

Subject : Laboratory Practice II

DEPARTMENT OF COMPUTER ENGINEERING

BHIVARABAI SAWANT INSTITUTE OF TECHNOLOGY & RESEARCH

WAGHOLI, PUNE – 412 207

CERTIFICATE

This is to certify that the Charul Joshi(BEA_40) , Kirti Reddy(BEA_39) and Danesh
Bastani(BEA_48) submitted there Project report on under my guidance and supervision. The work has
been done to my satisfaction during the academic year 2019-2020 under Savitribai Phule Pune University
guidelines.

Date:

Place: BSIOTR, PUNE.

Prof. Nilufar Zaman Dr. Prof. Gayatri Bhandari

Project Guide H.O. D.
ACKNOWLEDGEMENT

This is a great pleasure & immense satisfaction to express our deepest sense of
gratitude & thanks to everyone who has directly or indirectly helped us in
completing my Project work successfully.
we express our gratitude towards guide Prof. Nilufar Zaman and Dr.Prof.
G.M.Bhandari Head of Department of Computer Engineering, Bhivarabai Sawant
Institute Of Technology and Research, Wagholi, Pune who guided & encouraged
us in completing the Project work in scheduled time. we would like to thanks our
Principal, for allowing us to pursue our Project in this institute.

Charul Joshi(BEA_40)
Kirti Reddy(BEA_39)
Danesh Bastani(BEA_48)

INDEX
Sr. No. Chapters (14 points) Page
No

CERTIFICATE PAGE I

ACKNOWLEDGEMENT II

ABSTRACT III

INDEX PAGE IV

LIST OF FIGURES V

1. INTRODUCTION 1

2. OBJECTIVES AND SCOPE 3

3. PROPOSED SYSTEM 4
METHODOLOGY

4. RESULTS AND DISCUSSIONS 11

5. ADVANTAGES AND 19
DISADVANTAGES

6. CONCLUSION 20

7. REFERENCES 21
LIST OF FIGURES

Fig. No. Name of the Figures Page

No.
3.1 WEKA ARCHITECTURE
3.2 Classification steps
ABSTRACT
The concepts of “Artificial Intelligence”, “Deep Learning” and “Machine
Learning” are getting so popular among society. But here lets take a look at
datamining . The fundamental of data mining (DM) is to analyses data from
various points of view. Classify the data and summarize it, DM has begun to
be widespread in every and each application. Although we have huge
magnitude of data, but we do not have helpful information in each field,
there are a lot of DM software and tools to aid us the advantageous
information. One of them is WEKA . Weka is a data mining/machine
learning application and is being developed by Waikato University in New
Zealand. We can use WEKA for essentials of DM steps such as
preprocessing data (remove outlier, replace missing values etc.), attribute
selection, choose just relevant attribute and removing the irrelevant attribute
and redundant attribute, classification and assessment of varied classifier
models.The WEKA software is helpful for a lot of application’s type, and
it can be used in different applications. This tool is consisting of a lot of
algorithms for attribute selection, classification regression and clustering.
Weka provides access to SQL databases using Java Database connectivity
and can process the result returned by a database query. Weka provides
access to learning with Deep learning.
Weka is a collection of machine learning algorithm for data mining tasks.Weka
supports several standard functions as:
 data mining tasks
 data pre-processing
 clustering
 classification
 regression
 visualization

Fisher’s Iris data base (Fisher, 1936) is perhaps the best known database to be
found in the pattern recognition literature. The data set contains 3 classes of 50
instances each, where each class refers to a type of iris plant. One class is linearly
separable from the other two; the latter are not linearly separable from each other.
The data base contains the following attributes:
1). sepal length in cm
2). sepal width in cm
). petal length in cm
4). petal width in cm
On basis of attributes we classify iris plant into different classes
class: -
Iris Setosa
Iris Versicolour
Iris Virginica
First we start from Data preprocessing where we handle the null values in the data
and handle the outliers (we need to manage the data which are not within the
range). The next step is Explanatory data analysis (Cleaning the data) where we
perform visualization step and correlation step between each attribute and output
(always varies between +1 and -1) and we plot the graphs for all the attributes in
order to visualize then we get the important features.
Data preprocessing and transformation of the initial dataset. The process of Data
Preprocessing are described below:
-Data Cleaning:-Fill in missing values, resolve inconsistencies and
smooth noisy data.
-Data integration:-Using multiple databases or files.
-Data Transformation :-aggregation and normalization.
Data reduction:-reducing the volume but predicting similar analytical
results.
CHAPTER 2
OBJECTIVES AND SCOPE

The main objective of WEKA are too:

1.Make Machine Learning(ML) techniques generally available.

2.Apply them to practical problem as in weather analysis.
3.Analyze the dataset well and display results graphically.
4.Generating more clever view about weather analysis.

AREA OR SCOPE OF INVESTIGATION:

This project requires investigations in following areas:

1.Iris flower classification
2.Data mining techniques.
3.Best fit to predict accurate analysis.
The goal is to demonstrate the process of building a neural network based classifier that
solves the classification problem. In this project , neural network based approaches
will be shown, the process of building various neural network architectures will be demonstrated,
and finally classification results will be presented.
CHAPTER 3

PROPOSED SYSTEM METHODOLOGY

Fig 3.1 weka architecture

Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from
data. Data Mining could be a promising and flourishing frontier in analysis of data
and additionally the result of analysis has many applications. Data Mining can also
be referred as Knowledge Discovery from Data (KDD).This system functions as
the machine-driven or convenient extraction of patterns representing knowledge
implicitly keep or captured in huge databases, data warehouses, the Web, data
repositories, and information streams. Data Mining is a multidisciplinary field,
encompassing areas like information technology, machine learning, statistics,
pattern recognition, data retrieval, neural networks, information based systems,
artificial intelligence and data visualization.
Dataset in ARFF Format

Classification:

Preprocessing the dataset

Select Dataset

Choose Classifier

Turn the Train the

Performance classifier

Evaluate the model for test dataset

Find Performance Criteria

Fig 3.2 classification steps

Cross-validation:
Cross-validation is a technique that is used for the assessment of how the
results of statistical analysis generalize to an independent data set. This
results in a loss of testing and modeling capability. Crossvalidation is also
known as rotation estimation.
Cross validation is an extension of data split. In my understanding, the
purpose of k-fold cross validation is to test how well your model is trained
upon a given data and test it on unseen data. . So, for this purpose we use K-
fold cross validation to make sure that each and every data point comes to
test at-least once.
Discretize:
Data discretization converts a large number of data values into smaller
once, so that data evaluation and data management becomes very easy.
One reason to discretize continuous features is to improve signal-to-noise
ratio. Fitting a model to bins reduces the impact that small fluctuates in the
data has on the model, often small fluctuates are just noise. Each bin
"smooth" out the fluctuates/noises in sections of the data.
Normalize:
n statistics and applications of statistics, normalization can have a
range of meanings. In the simplest cases, normalization of ratings
means adjusting values measured on different scales to a notionally
common scale, often prior to averaging.
J48:
C4.5 (J48) is an algorithm used to generate a decision tree developed by
Ross Quinlan mentioned earlier. C4.5 is an extension of Quinlan's earlier
ID3 algorithm. The decision trees generated by C4.5 can be used for
classification, and for this reason, C4.5 is often referred to as a statistical
classifier.
Naïve Bayes:
Naive Bayes classifiers are a collection of classification algorithms based on
Bayes' Theorem. It is not a single algorithm but a family of algorithms
where all of them share a common principle, i.e. every pair of features being
classified is independent of each other. ... Here is a tabular representation of
our dataset.
LMT:
Logistic model trees are based on the earlier idea of a model tree: a
decision tree that has linear regression models at its leaves to provide a
piecewise linear regression model (where ordinary decision trees with
constants at their leaves would produce a piecewise constant model).In the
logistic variant, this algorithm is used to produce an LR model at every node
in the tree; the node is then split using the C4.5 criterion.

@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%
CHAPTER 4
RESULT AND DISCUSSIONS

Iris Flower Dataset using

WEKA
Screenshot Captured:

Pre-Processing done using normalize filter

Fig 4.1 normalization

Pre-Processing done using discretize filter:

Fig 4.2 discretization

Pre-Processing done using replace null values filter:

Fig 4.3 replacement of null value

Classifier LMT used for classification with 97% of accuracy:

== Run information ===

Scheme: weka.classifiers.trees.LMT -I -1 -M 15 -W 0.0

Relation: vote-weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0-
weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0
Instances: 435
Attributes: 17
handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
crime
duty-free-exports
export-administration-act-south-africa
Class
Test mode: evaluate on training data

=== Classifier model (full training set) ===

Logistic model tree

------------------
: LM_1:3/3 (435)

Number of Leaves : 1

Size of the Tree : 1

LM_1:
Class democrat :
0.41 +
[adoption-of-the-budget-resolution=y] * 0.81 +
[physician-fee-freeze=y] * -1.8 +
[synfuels-corporation-cutback=y] * 0.8

Class republican :
-0.41 +
[adoption-of-the-budget-resolution=y] * -0.81 +
[physician-fee-freeze=y] * 1.8 +
[synfuels-corporation-cutback=y] * -0.8
Time taken to build model: 0.66 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 419 96.3218 %

Incorrectly Classified Instances 16 3.6782 %
Kappa statistic 0.9224
Mean absolute error 0.1066
Root mean squared error 0.1841
Relative absolute error 22.478 %
Root relative squared error 37.8067 %
Total Number of Instances 435

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.970 0.048 0.970 0.970 0.970 0.922 0.986 0.987 democrat
0.952 0.030 0.952 0.952 0.952 0.922 0.986 0.972 republican
Weighted Avg. 0.963 0.041 0.963 0.963 0.963 0.922 0.986 0.981

=== Confusion Matrix ===

a b <-- classified as
259 8 | a = democrat
8 160 | b = republican
Classifier Naïve Bayes used for classification with 96% of
accuracy:

== Run information ===

Scheme: weka.classifiers.bayes.NaiveBayes
Relation: vote-weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0-
weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0
Instances: 435
Attributes: 17
handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
crime
duty-free-exports
export-administration-act-south-africa
Class
Test mode: evaluate on training data

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class
Attribute democrat republican
(0.61) (0.39)
===============================================================
handicapped-infants
n 103.0 135.0
y 157.0 32.0
[total] 260.0 167.0

water-project-cost-sharing
n 120.0 74.0
y 121.0 76.0
[total] 241.0 150.0
adoption-of-the-budget-resolution
n 30.0 143.0
y 232.0 23.0
[total] 262.0 166.0

physician-fee-freeze
n 246.0 3.0
y 15.0 164.0
[total] 261.0 167.0

el-salvador-aid
n 201.0 9.0
y 56.0 158.0
[total] 257.0 167.0

religious-groups-in-schools
n 136.0 18.0
y 124.0 150.0
[total] 260.0 168.0

anti-satellite-test-ban
n 60.0 124.0
y 201.0 40.0
[total] 261.0 164.0

aid-to-nicaraguan-contras
n 46.0 134.0
y 219.0 25.0
[total] 265.0 159.0

mx-missile
n 61.0 147.0
y 189.0 20.0
[total] 250.0 167.0

immigration
n 140.0 74.0
y 125.0 93.0
[total] 265.0 167.0

synfuels-corporation-cutback
n 127.0 139.0
y 130.0 22.0
[total] 257.0 161.0

education-spending
n 214.0 21.0
y 37.0 136.0
[total] 251.0 157.0

superfund-right-to-sue
n 180.0 23.0
y 74.0 137.0
[total] 254.0 160.0

crime
n 168.0 4.0
y 91.0 159.0
[total] 259.0 163.0

duty-free-exports
n 92.0 143.0
y 161.0 15.0
[total] 253.0 158.0

export-administration-act-south-africa
n 13.0 51.0
y 174.0 97.0
[total] 187.0 148.0

Time taken to build model: 0.01 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 393 90.3448 %

Incorrectly Classified Instances 42 9.6552 %
Kappa statistic 0.7999
Mean absolute error 0.0975
Root mean squared error 0.2944
Relative absolute error 20.555 %
Root relative squared error 60.469 %
Total Number of Instances 435

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.891 0.077 0.948 0.891 0.919 0.802 0.974 0.984 democrat
0.923 0.109 0.842 0.923 0.881 0.802 0.974 0.960 republican
Weighted Avg. 0.903 0.089 0.907 0.903 0.904 0.802 0.974 0.975

=== Confusion Matrix ===

a b <-- classified as
238 29 | a = democrat
13 155 | b = republican

Classifier J48 used for classification with 96% of accuracy:

= Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

=== Classifier model (full training set) ===

J48 pruned tree

------------------

physician-fee-freeze = n: democrat (253.41/3.75)

Number of Leaves : 6

Size of the tree : 11

Time taken to build model: 0.01 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 423 97.2414 %

Incorrectly Classified Instances 12 2.7586 %
Kappa statistic 0.9418
Mean absolute error 0.0519
Root mean squared error 0.1506
Relative absolute error 10.9481 %
Root relative squared error 30.9353 %
Total Number of Instances 435

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.978 0.036 0.978 0.978 0.978 0.942 0.986 0.987 democrat
0.964 0.022 0.964 0.964 0.964 0.942 0.986 0.970 republican
Weighted Avg. 0.972 0.031 0.972 0.972 0.972 0.942 0.986 0.981

=== Confusion Matrix ===

a b <-- classified as
261 6 | a = democrat
6 162 | b = republican
Fig 4.6 J48
Cross validation performed on Naïve Bayes:

fig 4.7 cross validation performed on naïve bayes

Cross Validation performed on Logistics:

Fig 4.8 cross validation performed on logistics

Cross Validation performed on Random Forest:

Fig 4.8 cross validation performed on random forest

Overall Analysis Of Classification Done:

Sr no Classifier used Instances correctly Instances incorrectly Overall Accuracy
Classified classified

1. LMT 146 4 97.3%

2 Naïve Bayes 144 6 96%

3. J48 144 6 96%

4. Cross 142 8 94.6%
validation on
Naïve Bayes

5 Cross 144 6 96%

validation on
Logistics

6. Cross validation 143 47 95.3%

on
Random
Forest

Fig 4.9

So ,we have concluded that as LMT Algorithm works out best for our iris
flower dataset analysis giving the accuracy of 97%,hereby is considered to be
suitable enough foe analyzing out given dataset.
CHAPTER 5
ADVANTAGES AND DISADVATAGES
ADVANTAGES:
1.Free available under the GNU General Public License.
2.Portability,Since it is fully implemented in java programming languages
3.Runs on almost any modern computing platform 4.Ease of use due to its graphical user
interface.

DISADVANTAGES:
1.It can only handle small datasets.
2.Blockchain can be a thing to be consider.
3.Using it via command line is a pain without read line
capability of the shell.
CHAPTER 6

CONCLUSION

Finally after all analysis we obtained the result for the corresponding dataset. We analysis that
J48 is the best classification algorithm analyzed, it’s then followed by naive bayes and LMT
with the approximate accuracy nearby to J48. But at some point both Naïve Bayes and LMT
and shows same level of accuracy . ,we have concluded that as LMT Algorithm works out best
for our weather dataset analysis giving the accuracy of 97%,hereby is considered to be suitable
enough foe analyzing out given dataset.
REFRENCES

1. https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/iris.arff
2. https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf

3. https://courses.soe.ucsc.edu/courses/tim245/Spring12/01/pages/attached-
files/attachments/11549

Machine Learning Multiple Choice Questions
100% (1)
Machine Learning Multiple Choice Questions
20 pages
CS3002 Solution Paper 2015.16 - v2
No ratings yet
CS3002 Solution Paper 2015.16 - v2
6 pages
Project Report "E-Commerce Recommendation"
No ratings yet
Project Report "E-Commerce Recommendation"
20 pages
WEKA
No ratings yet
WEKA
50 pages
Weka
No ratings yet
Weka
15 pages
Repot DMW
No ratings yet
Repot DMW
27 pages
Dwh Manual Merged
No ratings yet
Dwh Manual Merged
47 pages
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
No ratings yet
Institute Vision and Mission Vision: PEO1: PEO2: PEO3
35 pages
Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur
No ratings yet
Data Mining Techniques Using WEKA: Vinod Gupta School of Management, Iit Kharagpur
17 pages
Data Warehouse Lab Record
No ratings yet
Data Warehouse Lab Record
65 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
3007 Muhammad Waqar IT 6th EVE
No ratings yet
3007 Muhammad Waqar IT 6th EVE
2 pages
Comparative Analysis of Various Decision PDF
No ratings yet
Comparative Analysis of Various Decision PDF
7 pages
Music Genre Classification Report
No ratings yet
Music Genre Classification Report
36 pages
FINAL DW Record PDF
No ratings yet
FINAL DW Record PDF
32 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
44 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Syllabus - Data Mining Solution With Weka
No ratings yet
Syllabus - Data Mining Solution With Weka
5 pages
Medical Data Analysis Using Machine Learning Techniques: Devansh Bhasin (14Bcb0045)
No ratings yet
Medical Data Analysis Using Machine Learning Techniques: Devansh Bhasin (14Bcb0045)
51 pages
Comparative Analysis of Various Decision
No ratings yet
Comparative Analysis of Various Decision
7 pages
DWDM Lab Manual: Department of Computer Science and Engineering
No ratings yet
DWDM Lab Manual: Department of Computer Science and Engineering
46 pages
It
No ratings yet
It
62 pages
dm-lab-manualiii-i-1-mrits
No ratings yet
dm-lab-manualiii-i-1-mrits
39 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Hemant STRT
No ratings yet
Hemant STRT
18 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
68 pages
1822-b.e-cse-batchno-41
No ratings yet
1822-b.e-cse-batchno-41
175 pages
It5003 - Data Warehousing and Data Mining-1
No ratings yet
It5003 - Data Warehousing and Data Mining-1
5 pages
Keshav-ML-8
No ratings yet
Keshav-ML-8
61 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
Template To Prepare Documentation
No ratings yet
Template To Prepare Documentation
6 pages
PREPROCESSINGOFDresses Sales DATASET
No ratings yet
PREPROCESSINGOFDresses Sales DATASET
20 pages
Report
No ratings yet
Report
112 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Major Project (Gurman, Harneet, Kuldeep, Sahibjot)
No ratings yet
Major Project (Gurman, Harneet, Kuldeep, Sahibjot)
72 pages
PDF
No ratings yet
PDF
7 pages
Binder 1
No ratings yet
Binder 1
93 pages
DIS405 - Module Descritpr
No ratings yet
DIS405 - Module Descritpr
3 pages
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Weather Land Related Data
No ratings yet
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Weather Land Related Data
21 pages
A15 Final Document
No ratings yet
A15 Final Document
68 pages
31 - Mustansar Ali-Project Report - Data Mining
No ratings yet
31 - Mustansar Ali-Project Report - Data Mining
17 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
32 pages
Vijay DMPM
No ratings yet
Vijay DMPM
23 pages
Data Mining - Practical Machine Learning Tools and
No ratings yet
Data Mining - Practical Machine Learning Tools and
3 pages
Main Steps For Doing Data Mining Project Using Weka: February 2016
No ratings yet
Main Steps For Doing Data Mining Project Using Weka: February 2016
20 pages
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
No ratings yet
Data Mining - Practical Machine Learning Tools AndTechniques With Java Implementations
3 pages
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
No ratings yet
PG - M.sc. - Computer Science - 34141 Data Mining and Ware Housing
192 pages
Weka Activity Report
No ratings yet
Weka Activity Report
30 pages
documentation sample
No ratings yet
documentation sample
72 pages
Cse-F Batch8 Finaldoc
No ratings yet
Cse-F Batch8 Finaldoc
81 pages
Iare Data Preparation and Analysis Lab Manual
No ratings yet
Iare Data Preparation and Analysis Lab Manual
55 pages
DMDV_210
No ratings yet
DMDV_210
63 pages
DWDM MANUAL-1
No ratings yet
DWDM MANUAL-1
96 pages
2021-22 DM Lab Manual
No ratings yet
2021-22 DM Lab Manual
53 pages
LJKU_SEM_1_049010105_DATA_MINING_AND_ANALYSIS
No ratings yet
LJKU_SEM_1_049010105_DATA_MINING_AND_ANALYSIS
3 pages
MINI DOCC LAST (1)_removed
No ratings yet
MINI DOCC LAST (1)_removed
52 pages
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
No ratings yet
Iot Domain Analyst-Ece3502: Data Analytics Using Weka For Water Quality Related Data
14 pages
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Integrated Phenology and Climate in Rice Yields Predictin
No ratings yet
Integrated Phenology and Climate in Rice Yields Predictin
11 pages
Fdp-Aiml 2019 PDF
No ratings yet
Fdp-Aiml 2019 PDF
20 pages
Lecture Notes For Chapter 4: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4: by Tan, Steinbach, Kumar
107 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
36 pages
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
No ratings yet
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
13 pages
Detection of Virtual Private Network Traffic v3
No ratings yet
Detection of Virtual Private Network Traffic v3
8 pages
Quiz 9
No ratings yet
Quiz 9
3 pages
Advanced Lectures On Machine Learning ML Summer SC
No ratings yet
Advanced Lectures On Machine Learning ML Summer SC
249 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
A Parameter Based ANFIS Model For Crop Yield Prediction: Aditya Shastry, Sanjay H A and Madhura Hegde
No ratings yet
A Parameter Based ANFIS Model For Crop Yield Prediction: Aditya Shastry, Sanjay H A and Madhura Hegde
5 pages
Machine Learning Report
No ratings yet
Machine Learning Report
16 pages
Machine Learning Methods For Detecting Anomalies in A Power Transformer by Monitoring Its Hot-Spot Temperature - Chiara & Miguel
No ratings yet
Machine Learning Methods For Detecting Anomalies in A Power Transformer by Monitoring Its Hot-Spot Temperature - Chiara & Miguel
6 pages
Data Augmentation: Objectives
No ratings yet
Data Augmentation: Objectives
10 pages
The Challenge of Crafting Intelligible Intelligence: Daniel S. Weld Gagan Bansal
No ratings yet
The Challenge of Crafting Intelligible Intelligence: Daniel S. Weld Gagan Bansal
8 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
7 pages
Deep Learning Practical Assignment #1:: Instructions
No ratings yet
Deep Learning Practical Assignment #1:: Instructions
5 pages
Personalized Customer Churn Analysis With LSTM
No ratings yet
Personalized Customer Churn Analysis With LSTM
5 pages
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
No ratings yet
Mehta Et Al. - 2019 - A High-Bias, Low-Variance Introduction To Machine PDF
116 pages
A Very Brief Introduction To Machine Learning With Applications To Communication Systems
No ratings yet
A Very Brief Introduction To Machine Learning With Applications To Communication Systems
20 pages
15-381 Spring 2007 Assignment 6: Learning
No ratings yet
15-381 Spring 2007 Assignment 6: Learning
14 pages
06 - Classification Algorithms - Part II
No ratings yet
06 - Classification Algorithms - Part II
28 pages
Fraud Ebook Latest - Databricks PDF
No ratings yet
Fraud Ebook Latest - Databricks PDF
14 pages
Synopsis: Loan Prediction Stsyem
No ratings yet
Synopsis: Loan Prediction Stsyem
8 pages
Project Presentation Viva Question and Answers
No ratings yet
Project Presentation Viva Question and Answers
4 pages
Yu Generative Image Inpainting CVPR 2018 Paper
No ratings yet
Yu Generative Image Inpainting CVPR 2018 Paper
10 pages
Gaussian Face
No ratings yet
Gaussian Face
9 pages
TGIF: A New Dataset and Benchmark On Animated GIF Description
No ratings yet
TGIF: A New Dataset and Benchmark On Animated GIF Description
10 pages

JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On

Uploaded by

JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On

Uploaded by

JSPM’S

Bhivarabai Sawant Institute of Technology & Research

Department Of Computer Engineering

Academic Year 2019-20

Mini Project Report

Under the guidance of Prof.

Subject : Laboratory Practice II

BHIVARABAI SAWANT INSTITUTE OF TECHNOLOGY & RESEARCH

WAGHOLI, PUNE – 412 207

Place: BSIOTR, PUNE.

Prof. Nilufar Zaman Dr. Prof. Gayatri Bhandari

2. OBJECTIVES AND SCOPE 3

4. RESULTS AND DISCUSSIONS 11

Fig. No. Name of the Figures Page

The main objective of WEKA are too:

1.Make Machine Learning(ML) techniques generally available.

AREA OR SCOPE OF INVESTIGATION:

This project requires investigations in following areas:

PROPOSED SYSTEM METHODOLOGY

Fig 3.1 weka architecture

Preprocessing the dataset

Turn the Train the

Evaluate the model for test dataset

Find Performance Criteria

Fig 3.2 classification steps

Iris Flower Dataset using

Pre-Processing done using normalize filter

Pre-Processing done using discretize filter:

Pre-Processing done using replace null values filter:

Classifier LMT used for classification with 97% of accuracy:

Scheme: weka.classifiers.trees.LMT -I -1 -M 15 -W 0.0

=== Classifier model (full training set) ===

Logistic model tree

Size of the Tree : 1

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 419 96.3218 %

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

== Run information ===

=== Classifier model (full training set) ===

Naive Bayes Classifier

Time taken to build model: 0.01 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 393 90.3448 %

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

Classifier J48 used for classification with 96% of accuracy:

= Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

=== Classifier model (full training set) ===

J48 pruned tree

physician-fee-freeze = n: democrat (253.41/3.75)

Size of the tree : 11

Time taken to build model: 0.01 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 423 97.2414 %

=== Detailed Accuracy By Class ===

=== Confusion Matrix ===

fig 4.7 cross validation performed on naïve bayes

Cross Validation performed on Logistics:

Cross Validation performed on Random Forest:

Fig 4.8 cross validation performed on random forest

Overall Analysis Of Classification Done:

1. LMT 146 4 97.3%

3. J48 144 6 96%

5 Cross 144 6 96%

6. Cross validation 143 47 95.3%

You might also like