0% found this document useful (0 votes)

21 views12 pages

K Nearest Neighbours

Uploaded by

mahesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

K Nearest Neighbours

Uploaded by

mahesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Introduction to machine learning

K Nearest Neighbours

1
Introduction to machine learning
K Nearest Neighbors -

a. The KNN classifier is also a non parametric and instance-based learning

algorithm.
I. Non-parametric makes no assumptions about the distribution of data and thus
avoids the risks of mistaking the underlying distribution of the data. For example,
suppose the data is non-Gaussian but the learning model assumes a Gaussian form.
In that case, our algorithm would make extremely poor predictions
II. Instance-based learning means that the algorithm doesn’t explicitly learn a model. It
liberally memorizes (keeps in RAM) the training instances which are subsequently
used to predict classes of unseen data. Minimal training but high cost in testing!

b. For classification, the algorithm obtains a majority vote between the K most
similar instances to a given “unseen” observation. K is a count

c. Suited for classification where relationship between features and target classes
is numerous, complex and difficult to understand and yet items in a class tend
to be fairly homogenous on the values of attributes

d. Not suitable if the data is noisy and the target classes do not have clear
demarcation in terms of attribute values

2
Introduction to machine learning

K Nearest Neighbors -

e. The training data is represented by the scattered data points in the feature
space
f. The color of the data points indicate the class they belong to
g. The grey point is the query point who's class has to be fixed

3
Introduction to machine learning

K Nearest Neighbors – (Similarity Measurements)

a. Measuring similarity with distance between the points using Euclidian method

b. Other distance measurement methods include Manhattan distance, Minkowski distance,

Mahalanobis distance, Bhattacharya distance etc.

4
Introduction to machine learning

K Nearest Neighbors based classifications -

a. The distance formula is highly dependent on how features / attributes /

dimensions are measured.

b. Those dimensions which have larger possible range of values will dominate the
result of the distance calculation using Euclidian formula

c. To ensure all the dimensions have similar scale, we normalize the data on all
the dimensions / attributes

d. There are multiple ways of normalizing the data. We will use Z-score
standardization

5
Introduction to machine learning

K Nearest Neighbors based classifications –

There are many distance calculation formulas in Scikit-learn package-

1. Minkowski distance
2. Euclidean distance
3. Manhattan distance
4. Chebyshev distance
5. Mahalanobis distanc
6. Inner product
7. Cosine similarity
8. Pearson correlation
9. Hamming distance
10. Jaccard similarity
11. Edit distance or Levenshtein distance

Ref:
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html
http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/
6
Introduction to machine learning

K Nearest Neighbors - (Methodology)

a. Technically, given a positive count K, and unseen observation x and a similarity

metric d, KNN algorithm performs two steps -

b. It computes distance of the data point x from all the other data points in the
training set, arranges in ascending order, takes top K observations. Let this be
A. K is usually odd

c. It estimates the conditional probability

d. I is a function which returns 1 if the point in set A of k items is from class j. In

simple language, what proportion of K belongs to which class. The proportion is
probability and sum of all probabilities is 1

e. The data point x is assigned the class which has max probability

f. Alternate way of understanding KNN is as a method that calculates decision

boundaries in the feature space, a.k.a Voronoi regions or tessellations

7
Introduction to machine learning

K Nearest Neighbors - (Voronoi Diagram / Tessellations)

a. The Voronoi diagram is formed from lines that bisect and are perpendicular to
the lines that connect two neighboring vertices
b. Each point s has a Voronoi cell V(s) consisting of all points closer to s than to
any other points

Voronoi boundaries created using nearest

neighbor method i.e. K = 1

8
Introduction to machine learning

K Nearest Neighbors - (K the magic)

a. How to pick the right K? K can range from 1 to number of training data points!
b. K values can affect the performance of the classifier
c. K in KNN is a hyper parameter. It has to be discovered through iterations!
d. Since we will be evaluating a hyper parameter, we need to ensure data is split
into three i.e. training, validation and testing.
e. The iteration to find K should include only training and validation data
f. We can imagine K as a way of influencing the shape of the boundary between
classes

9
Introduction to machine learning
K Nearest Neighbors - (K and Voronoi boundaries)

Image Source : https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/

a. K = 1 create Voronoi boundaries based on a. Large value of K means a larger spread of

individual point. Each point has a region around data points is considered to decide the
itself as it’s domain boundary

b. The boundaries have sharp bends and there are b. The boundary will be relatively smooth with
many islands. The surface represents a complex little or no sharp turns. Islands will be
model likely to suffer from variance error minimized and variance error will be low but
bias errors increase
10
Introduction to machine learning

K Nearest Neighbors based classifications -

Advantages -
1. Makes no assumptions about distributions of classes in feature space
2. Can work for multi classes simultaneously
3. Easy to implement and understand
4. Not impacted by outliers

Dis-advantages -
5. Fixing the optimal value of K is a challenge
6. Will not be effective when the class distributions overlap
7. Does not output any models. Calculates distances for every new point (lazy learner)
8. Computationally intensive (O(D(N^2))), can be addressed using KD algorithms which
take time to prepare

11
Introduction to machine learning

K Nearest Neighbors based classifications -

Lab- 3 Model the given data to predict type of breast cancer

Description – Sample data is available at

https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)

Creator:

Dr. WIlliam H. Wolberg (physician)

The dataset has 10 attributes listed below University of Wisconsin Hospitals
1. Sample code number: id number Madison, Wisconsin, USA
2. Clump Thickness: 1 - 10
3. Uniformity of Cell Size: 1 - 10 Donor:
Olvi Mangasarian (mangasarian '@'
4. Uniformity of Cell Shape: 1 - 10
cs.wisc.edu)
5. Marginal Adhesion: 1 - 10 Received by David W. Aha (aha '@'
6. Single Epithelial Cell Size: 1 - 10 cs.jhu.edu)
7. Bare Nuclei: 1 - 10
8. Bland Chromatin: 1 - 10
9. Normal Nucleoli: 1 - 10
10. Mitoses: 1 - 10
Sol: KNN+Breast+Cancer+Modeling.ipynb

K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
KNN+%281%29
No ratings yet
KNN+%281%29
10 pages
Slides+ +KNN
No ratings yet
Slides+ +KNN
10 pages
KNN+%281%29
No ratings yet
KNN+%281%29
10 pages
KNN+%281%29
No ratings yet
KNN+%281%29
10 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
ML-Lecture-13-KNN
No ratings yet
ML-Lecture-13-KNN
14 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
ML-KN
No ratings yet
ML-KN
12 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
COS4852 2023 Unit 2 - KNN
No ratings yet
COS4852 2023 Unit 2 - KNN
10 pages
-Updated K-Nearest Neighbors in Machine Learning
No ratings yet
-Updated K-Nearest Neighbors in Machine Learning
11 pages
Cqf - Ml - 3 - Knn - Annotated
No ratings yet
Cqf - Ml - 3 - Knn - Annotated
20 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
KNN ALGO[1]
No ratings yet
KNN ALGO[1]
9 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Shubh
No ratings yet
Shubh
10 pages
UNIT 3 ML Distance Based Learning
No ratings yet
UNIT 3 ML Distance Based Learning
19 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
No ratings yet
Instance-Based Learning: Slides Provided by Introduction To Data Mining, 2 Edition
13 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Lecture 4 KNN
No ratings yet
Lecture 4 KNN
17 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
Unit v Non Parametric Machine Learning
No ratings yet
Unit v Non Parametric Machine Learning
47 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
KNN Algorithm
No ratings yet
KNN Algorithm
11 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Lecture Note #3_PEC-CS701E
No ratings yet
Lecture Note #3_PEC-CS701E
27 pages
Lecture8 KNN1
No ratings yet
Lecture8 KNN1
16 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
AI Lec 5
No ratings yet
AI Lec 5
37 pages
KNN
No ratings yet
KNN
3 pages
Intro to Knn
No ratings yet
Intro to Knn
8 pages
K_Nearest_Neighbour_Classifier
No ratings yet
K_Nearest_Neighbour_Classifier
24 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
KNN
No ratings yet
KNN
10 pages
K - Nearest Neighbours
No ratings yet
K - Nearest Neighbours
6 pages
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
No ratings yet
Nearest-Neighbor Classifier: MTL 782 Iit Delhi
16 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
knn
No ratings yet
knn
6 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
20 KNN Presentation
No ratings yet
20 KNN Presentation
16 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Key Fact Statement
No ratings yet
Key Fact Statement
7 pages
Facerecognition Results Metrics
No ratings yet
Facerecognition Results Metrics
3 pages
Module 4 String Data Structure
No ratings yet
Module 4 String Data Structure
9 pages
Case Study 3
No ratings yet
Case Study 3
2 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Chatgpt Prompt Engineering
50% (2)
Chatgpt Prompt Engineering
12 pages
11 Association Rules Mining and Recommendation Systems
No ratings yet
11 Association Rules Mining and Recommendation Systems
70 pages
MachineLearning Algorithm - Hope
No ratings yet
MachineLearning Algorithm - Hope
125 pages
Feature Selection Engineering
No ratings yet
Feature Selection Engineering
72 pages
Univarite Hope
No ratings yet
Univarite Hope
103 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Case Study 1
No ratings yet
Case Study 1
2 pages
Streaming Data Sample
No ratings yet
Streaming Data Sample
5 pages
Chatgpt Prompt Engineering
0% (1)
Chatgpt Prompt Engineering
9 pages
SymphonyAI Overview
No ratings yet
SymphonyAI Overview
8 pages
CV 2022081018080367
No ratings yet
CV 2022081018080367
2 pages
MaheshKumar ApplicationForm
No ratings yet
MaheshKumar ApplicationForm
8 pages
Options With Python
100% (1)
Options With Python
203 pages
Python For Data Science .
100% (2)
Python For Data Science .
112 pages
Voronoi Diagram: Subhas C. Nandy Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108
No ratings yet
Voronoi Diagram: Subhas C. Nandy Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108
60 pages
Computing Dirichlet Tessellations in The Plane
No ratings yet
Computing Dirichlet Tessellations in The Plane
6 pages
ACM ICPC World Finals 2018: Solution Sketches
No ratings yet
ACM ICPC World Finals 2018: Solution Sketches
13 pages
07 Voronoi II
No ratings yet
07 Voronoi II
43 pages
22501_WRE_CO1_UO1a_ppt
No ratings yet
22501_WRE_CO1_UO1a_ppt
19 pages
GLS - GSS613 Spatial Data Analyses and Modelling
No ratings yet
GLS - GSS613 Spatial Data Analyses and Modelling
112 pages
Iccv09 Texture
No ratings yet
Iccv09 Texture
8 pages
Origamizer-A Practical Algorithm for Folding Any Polyhedron
No ratings yet
Origamizer-A Practical Algorithm for Folding Any Polyhedron
16 pages
Yin 2016
No ratings yet
Yin 2016
4 pages
Foundations of Computer Vision Computational Geometry Visual Image Structures and Object Shape Detection 1st Edition James F. Peters (Auth.) - Quickly download the ebook to never miss important content
No ratings yet
Foundations of Computer Vision Computational Geometry Visual Image Structures and Object Shape Detection 1st Edition James F. Peters (Auth.) - Quickly download the ebook to never miss important content
58 pages
Voronoi PPQ (1)
No ratings yet
Voronoi PPQ (1)
23 pages
Barth VKI95
No ratings yet
Barth VKI95
141 pages
Instant Access to Encyclopedia of Distances Fourth Edition Michel Marie Deza ebook Full Chapters
100% (2)
Instant Access to Encyclopedia of Distances Fourth Edition Michel Marie Deza ebook Full Chapters
56 pages
Topology and Data: Bulletin of The American Mathematical Society April 2009
No ratings yet
Topology and Data: Bulletin of The American Mathematical Society April 2009
55 pages
Clipped Voro No I Paper
No ratings yet
Clipped Voro No I Paper
4 pages
Structures Folien PDF
No ratings yet
Structures Folien PDF
49 pages
Gis Processing
No ratings yet
Gis Processing
46 pages
Voronoi Diagrams
No ratings yet
Voronoi Diagrams
5 pages
WSN Module 1 PDF
No ratings yet
WSN Module 1 PDF
78 pages
Voronoi Diagram
No ratings yet
Voronoi Diagram
14 pages
Kelvin Cell
No ratings yet
Kelvin Cell
12 pages
Areal Average Rainfall of The Pampanga River Basin Using Three Methods
No ratings yet
Areal Average Rainfall of The Pampanga River Basin Using Three Methods
56 pages
Poynton Barsotti 3DLUT PDF
No ratings yet
Poynton Barsotti 3DLUT PDF
8 pages
Module 5
No ratings yet
Module 5
8 pages
Evolution of New Principles of Contemporary Architecture-Digital Continuity To Traditional Architecture
No ratings yet
Evolution of New Principles of Contemporary Architecture-Digital Continuity To Traditional Architecture
13 pages
Voronoi Diagram Practice
No ratings yet
Voronoi Diagram Practice
5 pages
Analysis of Satellite Constellations For The Continuous Coverage of Ground Regions
No ratings yet
Analysis of Satellite Constellations For The Continuous Coverage of Ground Regions
10 pages
K Nearest Neighbours
No ratings yet
K Nearest Neighbours
12 pages
Ge3 Math
100% (1)
Ge3 Math
111 pages
Thiessen Polygon: Glossary
No ratings yet
Thiessen Polygon: Glossary
6 pages