0% found this document useful (0 votes)

12 views

Lecture Note #3_PEC-CS701E

Uploaded by

halderriya56732

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Lecture Note #3_PEC-CS701E

Uploaded by

halderriya56732

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Algorithms: K Nearest Neighbors

1
Simple Analogy..
• Tell me about your friends(who your
neighbors are) and I will tell you who you are.

2
KNN – Different names
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning

3
Instance-based Learning

Its very similar to a

Desktop!!

4
What is instance based Learning?
In machine learning, instance-based
learning (sometimes called memory-based learning)
is a family of learning algorithms that, instead of
performing explicit generalization, compare new
problem instances with instances seen in training, which
have been stored in memory.

Because computation is postponed until a new instance

is observed, these algorithms are sometimes referred to
as "lazy."
What is instance based Learning?(Cont..)
It is called instance-based because it constructs
hypotheses directly from the training instances
themselves. This means that the hypothesis complexity
can grow with the data: in the worst case, a hypothesis
is a list of n training items and the computational
complexity of classifying a single new instance is O(n).

Advantage :
Ability to adapt its model to previously unseen data.
Examples are:
• KNN
• RBF networks
Why Is The KNN Called A “Lazy Learner” Or A
“Lazy Algorithm”?
KNN is called a lazy learner because when we supply training data
to this algorithm, the algorithm does not train itself at all.

KNN does not learn any discriminative function from the training
data. But it memorizes the entire training dataset instead.

There is no training time in KNN.

But, this skipping of training time comes with a cost.

Each time a new data point comes in and we want to make a

prediction, the KNN algorithm will search for the nearest
neighbors in the entire training set.

Hence the prediction step becomes more time-consuming and

computationally expensive.
What is KNN?
• A powerful classification algorithm used in pattern
recognition.

• K nearest neighbors stores all available cases and

classifies new cases based on a similarity measure(e.g
distance function)

• One of the top data mining algorithms used today.

• A non-parametric lazy learning algorithm (An Instance-

based Learning method).

8
KNN: Classification Approach

• An object (a new instance) is classified by a

majority votes for its neighbor classes.
• The object is assigned to the most common class
amongst its K nearest neighbors.(measured by a
distant function )

9
10
Distance Measure

Compute
Distance
Test
Record

Training
Records Choose k of the
“nearest” records

11
Different Distance Measure

12
Distance measure for Continuous Variables
Euclidean Distance

Manhattan distance

Minkowski distance

13
Distance Between Neighbors
• Calculate the distance between new example
(E) and all examples in the training set.

• Euclidean distance between two examples.

– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]

– The Euclidean distance between X and Y is defined

as: n
D( X ,Y ) =  (x − y )
i i
2

i=1 11
K-Nearest Neighbor Algorithm
• All the instances correspond to points in an n-dimensional
feature space.

• Each instance is represented with a set of numerical

attributes.

• Each of the training data consists of a set of vectors and a

class label associated with each vector.

• Classification is done by comparing feature vectors of

different K nearest points.

• Select the K-nearest examples to E in the training set.

• Assign E to the most common class among its K-nearest

neighbors.
15
KNN: Pseudocode
3-KNN: Example
Customer Age Income No. Class Distance from John
credit
cards
George 35 35K 3 No sqrt [(35-37)2+(35-50)2 +(3-
2)2]=15.16

Rachel 22 50K 2 Yes sqrt [(22-37)2+(50-50)2 +(2-

2)2]=15

Steve 63 200K 1 No sqrt [(63-37)2+(200-50)2 +(1-

2)2]=152.23

Tom 59 170K 1 No sqrt [(59-37)2+(170-50)2 +(1-

2)2]=122

Anne 25 40K 4 Yes sqrt [(25-37)2+(40-50)2 +(4-

2)2]=15.74

John 37 50K 2 ? YES

17
How to choose K?

• If K is too small it is sensitive to noise points.

• Larger K works well. But too large K may include majority

points from other classes.

• Rule of thumb is K < sqrt(n), n is number of examples.

18
19
X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x

20
KNN Feature Weighting

• Scale each feature by its importance for

classification

• Can use our prior knowledge about which features

are more important
• Can learn the weights wk using cross‐validation

21
Feature Normalization
• Distance between neighbors could be dominated
by some attributes with relatively large numbers.
e.g., income of customers in our previous example.

• Arises when two features are in different scales.

• Important to normalize those features.

– Mapping values to numbers between 0 – 1.

22
Nominal/Categorical Data
• Distance works naturally with numerical attributes.

• Binary value categorical data attributes can be regarded as 1

or 0.

23
KNN Classification
$250,000

$200,000

$150,000

Loan$ Non-Default
$100,000 Default

$50,000

$0
0 10 20 30 40 50 60 70
Age

24
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000
48 $220,000 Y 78000
33 $150,000 Y 8000

48 $142,000 ?

D = (x − x ) 2 +( y − y ) 2
1 2 1 2

25
KNN Classification – Standardized Distance
Age Loan Default Distance
0.125 0.11 N 0.7652
0.375 0.21 N 0.5200
0.625 0.31 N 0.3160
0 0.01 N 0.9245
0.375 0.50 N 0.3428
0.8 0.00 N 0.6220
0.075 0.38 Y 0.6669
0.5 0.22 Y 0.4437
1 0.41 Y 0.3650
0.7 1.00 Y 0.3861
0.325 0.65 Y 0.3771

0.7 0.61 ?
X − Min
Xs =
Max − Min
26
Strengths of KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.

Weaknesses of KNN
• Takes more time to classify a new example.
• need to calculate and compare distance from new
example to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.
27

ESL Brains - 10 Ways To Have A Better Conversation
No ratings yet
ESL Brains - 10 Ways To Have A Better Conversation
15 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Knowledge Management Handbook
100% (3)
Knowledge Management Handbook
584 pages
621 Mindsets and Behaviors Action Plan
No ratings yet
621 Mindsets and Behaviors Action Plan
3 pages
Exercise Part of Speech
83% (6)
Exercise Part of Speech
3 pages
Law of Philosophy
100% (1)
Law of Philosophy
21 pages
Instance Based Learning
No ratings yet
Instance Based Learning
16 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
KNN v2
No ratings yet
KNN v2
31 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
KNN PDF
No ratings yet
KNN PDF
30 pages
KNN
No ratings yet
KNN
29 pages
KNN Algorithm
No ratings yet
KNN Algorithm
16 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
3.1 K Nearest Neighbour Classifier (1)
No ratings yet
3.1 K Nearest Neighbour Classifier (1)
24 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
KNN
No ratings yet
KNN
53 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Training Machine Learning KNN 2017
No ratings yet
Training Machine Learning KNN 2017
17 pages
UNIT-2 K-Nn-March 2024
No ratings yet
UNIT-2 K-Nn-March 2024
23 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
K - Nearest Neighbors
No ratings yet
K - Nearest Neighbors
26 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
Supervised Learning KNN
No ratings yet
Supervised Learning KNN
23 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Introduction To KNN
100% (1)
Introduction To KNN
8 pages
KNN
No ratings yet
KNN
3 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
16 pages
K Nearest Neighbor KNN
No ratings yet
K Nearest Neighbor KNN
18 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Machine Learning-Lecture 03
No ratings yet
Machine Learning-Lecture 03
19 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
K-NN Method
No ratings yet
K-NN Method
12 pages
Lecture 3 - kNN algorithm
No ratings yet
Lecture 3 - kNN algorithm
28 pages
ML Lecture#10
No ratings yet
ML Lecture#10
17 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
27 pages
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
No ratings yet
K-Nearest Neighbor Classifier: This Slide Is Modified From Dr. Tan's Slides. Thanks To Dr. Tan
11 pages
K-Nearest Neighbor (KNN) ..: Class or Value
No ratings yet
K-Nearest Neighbor (KNN) ..: Class or Value
18 pages
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
No ratings yet
Supervised Learning and K Nearest Neighbors: Business Intelligence For Managers
15 pages
K-Nearest Neighbour (KNN)
No ratings yet
K-Nearest Neighbour (KNN)
14 pages
KNN Presentation
No ratings yet
KNN Presentation
16 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
SUMSEM-2020-21 MEE6070 ETH VL2020210700842 Reference Material I 16-Jul-2021 K-Nearest Neighbors (KNN) Algorithm (Repaired) Week-3
No ratings yet
SUMSEM-2020-21 MEE6070 ETH VL2020210700842 Reference Material I 16-Jul-2021 K-Nearest Neighbors (KNN) Algorithm (Repaired) Week-3
40 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
33 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
pcc-cs301
No ratings yet
pcc-cs301
26 pages
Presentation (30)
No ratings yet
Presentation (30)
1 page
28100121063_PEC-CS801D
No ratings yet
28100121063_PEC-CS801D
15 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Lecture Note #7_PEC-CS701E
No ratings yet
Lecture Note #7_PEC-CS701E
28 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
Lecture Note #6_PEC-CS701E
No ratings yet
Lecture Note #6_PEC-CS701E
11 pages
Lecture Note #1_PEC-CS701E
No ratings yet
Lecture Note #1_PEC-CS701E
19 pages
Lecture Note #2_PEC-CS701E
No ratings yet
Lecture Note #2_PEC-CS701E
19 pages
Unlimited Memory_ How to Use Advanced Learning Strategies to Learn Faster, Remember More, and Be More Productive by Kevin Horsley is a practical guide that teaches readers how to improve memory, enhance learning, a
No ratings yet
Unlimited Memory_ How to Use Advanced Learning Strategies to Learn Faster, Remember More, and Be More Productive by Kevin Horsley is a practical guide that teaches readers how to improve memory, enhance learning, a
4 pages
A Theory of Partial Truth PDF
No ratings yet
A Theory of Partial Truth PDF
24 pages
Accelerated Learning Pocketbook: by Brin Best
100% (1)
Accelerated Learning Pocketbook: by Brin Best
24 pages
Brief - Guides - Elements of An Essay PDF
No ratings yet
Brief - Guides - Elements of An Essay PDF
3 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Broaden and Built Theory
No ratings yet
Broaden and Built Theory
2 pages
Moam - Info English in Mind 2 Level 3 Intermediate Teachers Re 5a384cbd1723dd732195570e
No ratings yet
Moam - Info English in Mind 2 Level 3 Intermediate Teachers Re 5a384cbd1723dd732195570e
3 pages
Beginner Unit 2b
No ratings yet
Beginner Unit 2b
2 pages
DLL English-6 Q3 W7
No ratings yet
DLL English-6 Q3 W7
7 pages
Curriculum Guide: R U A A E C
No ratings yet
Curriculum Guide: R U A A E C
8 pages
"A NEGATIVE Mind Will Never Help You Live A POSITIVE Life": The Power of Positive Thinking
No ratings yet
"A NEGATIVE Mind Will Never Help You Live A POSITIVE Life": The Power of Positive Thinking
13 pages
NURS FPX 6105 Assessment 4 Assessment Strategies and Complete Course Plan
No ratings yet
NURS FPX 6105 Assessment 4 Assessment Strategies and Complete Course Plan
5 pages
7 Cs of Business Writing
No ratings yet
7 Cs of Business Writing
34 pages
2019 BUSINESS STUDIES ACTION VERBS
No ratings yet
2019 BUSINESS STUDIES ACTION VERBS
1 page
Mental Health and Music
No ratings yet
Mental Health and Music
11 pages
From The Unconscious To The Conscious by Gustave Geley
100% (3)
From The Unconscious To The Conscious by Gustave Geley
348 pages
Multilingualism
No ratings yet
Multilingualism
4 pages
An Analysis of Slang Words Used in Social Media
No ratings yet
An Analysis of Slang Words Used in Social Media
5 pages
Present Tense Past Tense
No ratings yet
Present Tense Past Tense
4 pages
Lesson Plan IP Grade 4 SS History T1 W7&8
No ratings yet
Lesson Plan IP Grade 4 SS History T1 W7&8
4 pages
Pola Komunikasi Konstruktif Mahasiswa Saat Menghadapi Tekanan Psikologis Dalam Penyelesaian Tugas Akhir
No ratings yet
Pola Komunikasi Konstruktif Mahasiswa Saat Menghadapi Tekanan Psikologis Dalam Penyelesaian Tugas Akhir
17 pages
1AS 3rd Term Exam of English
67% (3)
1AS 3rd Term Exam of English
2 pages
Conflict and Negotiation
100% (3)
Conflict and Negotiation
26 pages
22202241115_Pamungkas Shena Wicaksana Putra_Task of Week-4_Summary of Reading
No ratings yet
22202241115_Pamungkas Shena Wicaksana Putra_Task of Week-4_Summary of Reading
2 pages
CH 10
No ratings yet
CH 10
41 pages

Lecture Note #3_PEC-CS701E

Uploaded by

Lecture Note #3_PEC-CS701E

Uploaded by

Algorithms: K Nearest Neighbors

Its very similar to a

Because computation is postponed until a new instance

There is no training time in KNN.

Each time a new data point comes in and we want to make a

Hence the prediction step becomes more time-consuming and

• K nearest neighbors stores all available cases and

• One of the top data mining algorithms used today.

• A non-parametric lazy learning algorithm (An Instance-

• An object (a new instance) is classified by a

• Euclidean distance between two examples.

– The Euclidean distance between X and Y is defined

• Each instance is represented with a set of numerical

• Each of the training data consists of a set of vectors and a

• Classification is done by comparing feature vectors of

• Select the K-nearest examples to E in the training set.

• Assign E to the most common class among its K-nearest

Rachel 22 50K 2 Yes sqrt [(22-37)2+(50-50)2 +(2-

Steve 63 200K 1 No sqrt [(63-37)2+(200-50)2 +(1-

Tom 59 170K 1 No sqrt [(59-37)2+(170-50)2 +(1-

Anne 25 40K 4 Yes sqrt [(25-37)2+(40-50)2 +(4-

John 37 50K 2 ? YES

• If K is too small it is sensitive to noise points.

• Larger K works well. But too large K may include majority

• Rule of thumb is K < sqrt(n), n is number of examples.

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

• Scale each feature by its importance for

• Can use our prior knowledge about which features

• Arises when two features are in different scales.

• Important to normalize those features.

• Binary value categorical data attributes can be regarded as 1

You might also like