JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
JSPM'S Bhivarabai Sawant Institute of Technology & Research: Mini Project Report On
Submitted by:
Charul Joshi(BEA_40)
Kirti Reddy(BEA_39)
Danesh Bastani(BEA_48)
CERTIFICATE
This is to certify that the Charul Joshi(BEA_40) , Kirti Reddy(BEA_39) and Danesh
Bastani(BEA_48) submitted there Project report on under my guidance and supervision. The work has
been done to my satisfaction during the academic year 2019-2020 under Savitribai Phule Pune University
guidelines.
Date:
This is a great pleasure & immense satisfaction to express our deepest sense of
gratitude & thanks to everyone who has directly or indirectly helped us in
completing my Project work successfully.
we express our gratitude towards guide Prof. Nilufar Zaman and Dr.Prof.
G.M.Bhandari Head of Department of Computer Engineering, Bhivarabai Sawant
Institute Of Technology and Research, Wagholi, Pune who guided & encouraged
us in completing the Project work in scheduled time. we would like to thanks our
Principal, for allowing us to pursue our Project in this institute.
Charul Joshi(BEA_40)
Kirti Reddy(BEA_39)
Danesh Bastani(BEA_48)
INDEX
Sr. No. Chapters (14 points) Page
No
CERTIFICATE PAGE I
ACKNOWLEDGEMENT II
ABSTRACT III
INDEX PAGE IV
LIST OF FIGURES V
1. INTRODUCTION 1
3. PROPOSED SYSTEM 4
METHODOLOGY
5. ADVANTAGES AND 19
DISADVANTAGES
6. CONCLUSION 20
7. REFERENCES 21
LIST OF FIGURES
Fisher’s Iris data base (Fisher, 1936) is perhaps the best known database to be
found in the pattern recognition literature. The data set contains 3 classes of 50
instances each, where each class refers to a type of iris plant. One class is linearly
separable from the other two; the latter are not linearly separable from each other.
The data base contains the following attributes:
1). sepal length in cm
2). sepal width in cm
). petal length in cm
4). petal width in cm
On basis of attributes we classify iris plant into different classes
class: -
Iris Setosa
Iris Versicolour
Iris Virginica
First we start from Data preprocessing where we handle the null values in the data
and handle the outliers (we need to manage the data which are not within the
range). The next step is Explanatory data analysis (Cleaning the data) where we
perform visualization step and correlation step between each attribute and output
(always varies between +1 and -1) and we plot the graphs for all the attributes in
order to visualize then we get the important features.
Data preprocessing and transformation of the initial dataset. The process of Data
Preprocessing are described below:
-Data Cleaning:-Fill in missing values, resolve inconsistencies and
smooth noisy data.
-Data integration:-Using multiple databases or files.
-Data Transformation :-aggregation and normalization.
Data reduction:-reducing the volume but predicting similar analytical
results.
CHAPTER 2
OBJECTIVES AND SCOPE
Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from
data. Data Mining could be a promising and flourishing frontier in analysis of data
and additionally the result of analysis has many applications. Data Mining can also
be referred as Knowledge Discovery from Data (KDD).This system functions as
the machine-driven or convenient extraction of patterns representing knowledge
implicitly keep or captured in huge databases, data warehouses, the Web, data
repositories, and information streams. Data Mining is a multidisciplinary field,
encompassing areas like information technology, machine learning, statistics,
pattern recognition, data retrieval, neural networks, information based systems,
artificial intelligence and data visualization.
Dataset in ARFF Format
Classification:
Select Dataset
Choose Classifier
@RELATION iris
@ATTRIBUTE sepallength REAL
@ATTRIBUTE sepalwidth REAL
@ATTRIBUTE petallength REAL
@ATTRIBUTE petalwidth REAL
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%
CHAPTER 4
RESULT AND DISCUSSIONS
Number of Leaves : 1
Class republican :
-0.41 +
[adoption-of-the-budget-resolution=y] * -0.81 +
[physician-fee-freeze=y] * 1.8 +
[synfuels-corporation-cutback=y] * -0.8
Time taken to build model: 0.66 seconds
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.970 0.048 0.970 0.970 0.970 0.922 0.986 0.987 democrat
0.952 0.030 0.952 0.952 0.952 0.922 0.986 0.972 republican
Weighted Avg. 0.963 0.041 0.963 0.963 0.963 0.922 0.986 0.981
a b <-- classified as
259 8 | a = democrat
8 160 | b = republican
Classifier Naïve Bayes used for classification with 96% of
accuracy:
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: vote-weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0-
weka.filters.unsupervised.attribute.Normalize-S1.0-T0.0
Instances: 435
Attributes: 17
handicapped-infants
water-project-cost-sharing
adoption-of-the-budget-resolution
physician-fee-freeze
el-salvador-aid
religious-groups-in-schools
anti-satellite-test-ban
aid-to-nicaraguan-contras
mx-missile
immigration
synfuels-corporation-cutback
education-spending
superfund-right-to-sue
crime
duty-free-exports
export-administration-act-south-africa
Class
Test mode: evaluate on training data
Class
Attribute democrat republican
(0.61) (0.39)
===============================================================
handicapped-infants
n 103.0 135.0
y 157.0 32.0
[total] 260.0 167.0
water-project-cost-sharing
n 120.0 74.0
y 121.0 76.0
[total] 241.0 150.0
adoption-of-the-budget-resolution
n 30.0 143.0
y 232.0 23.0
[total] 262.0 166.0
physician-fee-freeze
n 246.0 3.0
y 15.0 164.0
[total] 261.0 167.0
el-salvador-aid
n 201.0 9.0
y 56.0 158.0
[total] 257.0 167.0
religious-groups-in-schools
n 136.0 18.0
y 124.0 150.0
[total] 260.0 168.0
anti-satellite-test-ban
n 60.0 124.0
y 201.0 40.0
[total] 261.0 164.0
aid-to-nicaraguan-contras
n 46.0 134.0
y 219.0 25.0
[total] 265.0 159.0
mx-missile
n 61.0 147.0
y 189.0 20.0
[total] 250.0 167.0
immigration
n 140.0 74.0
y 125.0 93.0
[total] 265.0 167.0
synfuels-corporation-cutback
n 127.0 139.0
y 130.0 22.0
[total] 257.0 161.0
education-spending
n 214.0 21.0
y 37.0 136.0
[total] 251.0 157.0
superfund-right-to-sue
n 180.0 23.0
y 74.0 137.0
[total] 254.0 160.0
crime
n 168.0 4.0
y 91.0 159.0
[total] 259.0 163.0
duty-free-exports
n 92.0 143.0
y 161.0 15.0
[total] 253.0 158.0
export-administration-act-south-africa
n 13.0 51.0
y 174.0 97.0
[total] 187.0 148.0
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.891 0.077 0.948 0.891 0.919 0.802 0.974 0.984 democrat
0.923 0.109 0.842 0.923 0.881 0.802 0.974 0.960 republican
Weighted Avg. 0.903 0.089 0.907 0.903 0.904 0.802 0.974 0.975
a b <-- classified as
238 29 | a = democrat
13 155 | b = republican
Number of Leaves : 6
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.978 0.036 0.978 0.978 0.978 0.942 0.986 0.987 democrat
0.964 0.022 0.964 0.964 0.964 0.942 0.986 0.970 republican
Weighted Avg. 0.972 0.031 0.972 0.972 0.972 0.942 0.986 0.981
a b <-- classified as
261 6 | a = democrat
6 162 | b = republican
Fig 4.6 J48
Cross validation performed on Naïve Bayes:
Fig 4.9
So ,we have concluded that as LMT Algorithm works out best for our iris
flower dataset analysis giving the accuracy of 97%,hereby is considered to be
suitable enough foe analyzing out given dataset.
CHAPTER 5
ADVANTAGES AND DISADVATAGES
ADVANTAGES:
1.Free available under the GNU General Public License.
2.Portability,Since it is fully implemented in java programming languages
3.Runs on almost any modern computing platform 4.Ease of use due to its graphical user
interface.
DISADVANTAGES:
1.It can only handle small datasets.
2.Blockchain can be a thing to be consider.
3.Using it via command line is a pain without read line
capability of the shell.
CHAPTER 6
CONCLUSION
Finally after all analysis we obtained the result for the corresponding dataset. We analysis that
J48 is the best classification algorithm analyzed, it’s then followed by naive bayes and LMT
with the approximate accuracy nearby to J48. But at some point both Naïve Bayes and LMT
and shows same level of accuracy . ,we have concluded that as LMT Algorithm works out best
for our weather dataset analysis giving the accuracy of 97%,hereby is considered to be suitable
enough foe analyzing out given dataset.
REFRENCES
1. https://storm.cis.fordham.edu/~gweiss/data-mining/weka-data/iris.arff
2. https://www.cs.waikato.ac.nz/ml/weka/Witten_et_al_2016_appendix.pdf
3. https://courses.soe.ucsc.edu/courses/tim245/Spring12/01/pages/attached-
files/attachments/11549