数据集说明
Data Set Information:
This is perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.
Predicted attribute: class of iris plant.
This is an exceedingly simple domain.
This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick ‘@’ espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,”Iris-setosa” where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,”Iris-setosa” where the errors are in the second and third features.
Attribute Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm
- class:
– Iris Setosa
– Iris Versicolour
– Iris Virginica
数据集来源
http://archive.ics.uci.edu/ml/
关于机器学习的数据很多的,可用来做练习
KNN分类说明
knn算法链接
1.随机的划分测试集和训练集
7:3 和6:4
2.数据是否标准化,以及选取标准化的方法
3.选取合适的k
测试结果
只对选取6:4的进行说明,训练集过多的话,预测效果比较好的
1.不标准化,k=3
>>> reload(iris)
<module 'iris' from 'iris.py'>
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
1.0 0.983333333333
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
1.0 0.983333333333