iris数据集进行KNN分类

最新推荐文章于 2021-11-06 21:10:54 发布

原创

最新推荐文章于 2021-11-06 21:10:54 发布 · 5.9k 阅读

12 ·

CC 4.0 BY-SA版权

文章标签：

#iris #Python #knn

本文介绍了使用Iris数据集进行KNN分类的实验，详细阐述了数据集信息、来源以及KNN分类的基本原理。通过对比不同k值、是否进行数据标准化以及选择不同的标准化方法对分类结果的影响，揭示了k值的选择和数据预处理在模型性能中的关键作用。实验结果显示，未标准化数据、k=3时错误率较低，但数据标准化方法对结果有显著影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

数据集说明

Data Set Information:

This is perhaps the best known database to be found in the pattern recognition literature. Fisher’s paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Predicted attribute: class of iris plant.

This is an exceedingly simple domain.

This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick ‘@’ espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,”Iris-setosa” where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,”Iris-setosa” where the errors are in the second and third features.

Attribute Information:

sepal length in cm
sepal width in cm
petal length in cm
petal width in cm
class:
– Iris Setosa
– Iris Versicolour
– Iris Virginica

数据集来源

http://archive.ics.uci.edu/ml/
关于机器学习的数据很多的，可用来做练习

KNN分类说明

knn算法链接
1.随机的划分测试集和训练集
7:3 和6:4
2.数据是否标准化，以及选取标准化的方法
3.选取合适的k

测试结果

只对选取6:4的进行说明，训练集过多的话，预测效果比较好的
1.不标准化，k=3

>>> reload(iris)
<module 'iris' from 'iris.py'>
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
1.0 0.983333333333
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
2.0 0.966666666667
>>> iris.irisClassTest()
1.0 0.983333333333