0% found this document useful (0 votes)
3 views6 pages

ML 3

The document provides an instructional sheet for implementing the K-Nearest Neighbors (KNN) classification algorithm using Python in a Machine Learning course. It outlines the hardware and software requirements, explains the KNN algorithm, and includes source code for training and testing the model using the Iris dataset. The conclusion highlights KNN's effectiveness in various applications and its ease of implementation with small datasets.

Uploaded by

softdevloper112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

ML 3

The document provides an instructional sheet for implementing the K-Nearest Neighbors (KNN) classification algorithm using Python in a Machine Learning course. It outlines the hardware and software requirements, explains the KNN algorithm, and includes source code for training and testing the model using the Iris dataset. The conclusion highlights KNN's effectiveness in various applications and its ease of implementation with small datasets.

Uploaded by

softdevloper112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Since 2001

Bhartiya Gramin Punarrachna Sanstha’s


Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

DEPARTMENT: COMPUTER SCIENCE & ENGINEERING DEPARTMENT

PRACTICAL EXPERIMENT INSTRUCTION SHEET

EXPERIMENT TITLE: IMPLEMENT K-NEAREST NEIGHBOUR’S CLASSIFICATION USING


PYTHON.

EXPERIMENT NO.:03 SUBJECT: Machine Learning


CLASS:TE CSE SEMESTER: VI
Aim: IMPLEMENT K-NEAREST NEIGHBOUR’S CLASSIFICATION USING PYTHON.
Hardware Requirement:

Intel(R) Core(TM) i5-10505 CPU @ 3.20 GHz Processor, 8 GB RAM, 256 GB HDD, 20” LCD
Monitor, Keyboard, Mouse.
Software Requirement:
OS ,Python 2.7, PyCharm and Anaconda. Studio.
DESCRIPTION:
This algorithm is used to solve the classification model problems. K-nearest neighbor or K-NN
Algorithm basically creates an imaginary boundary to classify the data. When new data points
come in, the algorithm will try to predict that to the nearest of the boundary line.
Therefore, larger k value means smother curves of separation resulting in less complex models.
Whereas, smaller k value tends to over fit the data and resulting in complex models.
It’s very important to have the right k-value when analyzing the dataset to avoid over fitting and
under fitting of the dataset.
KNN MODEL REPRESENTATION:
The model representation for KNN is the entire training dataset. It is as simple as that.
KNN has no model other than storing the entire dataset, so there is no learning required.
Efficient implementations can store the data using complex data structures like k-d trees to make
look-up and matching of new patterns during prediction efficient.
Since 2001
Bhartiya Gramin Punarrachna Sanstha’s
Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

Because the entire training dataset is stored, you may want to think carefully about the
consistency of your training data. It might be a good idea to curate it, update it often as new data
becomes available and remove erroneous and outlier data.
 The k-nearest neighbor algorithm is imported from the scikit-learnpackage.
 Create feature and target variables.
 Split data into training and testdata.
 Generate a k-NN model using neighborsvalue.
 Train or fit the data into themodel.
 Predict thefuture
1. What is the KNN Algorithm?
KNN(K-nearest neighbors) is a supervised learning and non-parametric algorithm that
can be used to
solve both classification and regression problem statements.
It uses data in which there is a target column present i.e, labeled data to model a function
to produce an
output for the unseen data. It uses the Euclidean distance formula to compute the distance
between the data points for classification or prediction.
The main objective of this algorithm is that similar data points must be close to each
other so it uses the distance to calculate the similar points that are close to each other.
2. Why is KNN a non-parametric Algorithm?
The term “non-parametric” refers to not making any assumptions on the underlying data
distribution.
These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each
training case as a parameter of the model. So, KNN is a non-parametric algorithm.
3. What is “K” in the KNN Algorithm?
K represents the number of nearest neighbors you want to select to predict the class of a
given item, which is coming as an unseen dataset for the model.
Since 2001
Bhartiya Gramin Punarrachna Sanstha’s
Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

4. Why is the odd value of “K” preferred over even values in the KNN Algorithm?
The odd value of K should be preferred over even values in order to ensure that there are
no ties in the
voting. If the square root of a number of data points is even, then add or subtract 1 to it to
make it odd.
5. How does the KNN algorithm make the predictions on the unseen dataset?
The following operations have happened during each iteration of the algorithm. For each
of the unseen or test data point, the kNN classifier must:
Step-1: Calculate the distances of test point to all points in the training set and store them
Step-2: Sort the calculated distances in increasing order
Step-3: Store the K nearest points from our training dataset
Step-4: Calculate the proportions of each class
Step-5: Assign the class with the highest proportion
Source Code:
''' Aim : Implement k-nearest neighbours classification using python.

=================================
Explanation:
=================================
===> To run this program you need to install the sklearn Module
===> Open Command propmt and then execute the following command to install sklearn
Module
---> pip install scikit-learn
-----------------------------------------------
In this program, we are going to use iris dataset.And this dataset Split into training(70%) and test
set(30%).
The iris dataset conatins the following features
Since 2001
Bhartiya Gramin Punarrachna Sanstha’s
Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

---> sepal length (cm)


---> sepal width (cm)
---> petal length (cm)
---> petal width (cm)
The Sample data in iris dataset format is [5.4 3.4 1.7 0.2]
Where 5.4 ---> sepal length (cm)
3.4 ---> sepal width (cm)
1.7 ---> petal length (cm)
0.2 ---> petal width (cm)
===============================
Source Code :
==============================='''
# Import necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import random
# Loading data
data_iris = load_iris()
# To get list of target names
label_target = data_iris.target_names
print()
print("Sample Data from Iris Dataset")
print("*"*30)
# to display the sample data from the iris dataset
for i in range(10):
rn = random.randint(0,120)
print(data_iris.data[rn],"===>",label_target[data_iris.target[rn]])
Since 2001
Bhartiya Gramin Punarrachna Sanstha’s
Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

# Create feature and target arrays


X = data_iris.data
y = data_iris.target

# Split into training and test set


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.3, random_state=1)

print("The Training dataset length: ",len(X_train))


print("The Testing dataset length: ",len(X_test))
try:
nn = int(input("Enter number of neighbors :"))
knn = KNeighborsClassifier(nn)

knn.fit(X_train, y_train)
# to display the score
print("The Score is :",knn.score(X_test, y_test))
# To get test data from the user
test_data = input("Enter Test Data :").split(",")
for i in range(len(test_data)):
test_data[i] = float(test_data[i])
print()

v = knn.predict([test_data])
print("Predicted output is :",label_target[v])
except:
print("Please supply valid input......")
Since 2001
Bhartiya Gramin Punarrachna Sanstha’s
Hi-Tech Institute of Technology, Aurangabad
A Pioneer to Shape Global Technocrats
Approved By AICTE, DTE Govt. of Maharashtra & Affiliated to Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad
P-119, Bajajnagar, MIDC Waluj, Aurangabad, Maharashtra, India - 431136P: (0240) 2552240, 2553495, 2553496 Web:http://hitechengg.edu.in/

Output:

Conclusion: KNN is an effective machine learning algorithm that can be used in credit scoring,
predication of cancer cells, image processing, and many more applications. The main important
of using KNN is that it’s easy to implement and works well with small datasets.

You might also like