0% found this document useful (0 votes)
2 views

02-knn__slides

Lecture 02 covers Nearest Neighbor Methods in machine learning, focusing on K-nearest neighbors (KNN) and its applications, including web usage data mining and recommendation systems. It discusses decision boundaries, runtime complexity, and improvements to KNN, along with practical implementation in Python. Additionally, the lecture references research on distance metric learning and remaining useful life estimation for lithium-ion cells using KNN regression.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

02-knn__slides

Lecture 02 covers Nearest Neighbor Methods in machine learning, focusing on K-nearest neighbors (KNN) and its applications, including web usage data mining and recommendation systems. It discusses decision boundaries, runtime complexity, and improvements to KNN, along with practical implementation in Python. Additionally, the lecture references research on distance metric learning and remaining useful life estimation for lithium-ion cells using KNN regression.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Lecture 02

Nearest Neighbor Methods

STAT 451: Intro to Machine Learning, Fall 2020


Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat451-fs2020/

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 1


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 2
Applications of Nearest Neighbor Methods

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors


The major problem of many on-line web sites is the presentation of many choices to the
client at a time; this usually results to strenuous and time consuming task in finding the
right product or information on the site. In this work, we present a study of automatic web
usage data mining and recommendation system based on current user behavior through
his/her click stream data on the newly developed Really Simple Syndication (RSS) reader
website, in order to provide relevant information to the individual without explicitly asking
for it. The K-Nearest-Neighbor (KNN) classification method has been trained to be used
on-line and in Real-Time to identify clients/visitors click stream data, matching it to a
particular user group and recommend a tailored browsing option that meet the need of the
specific user at a particular time. [...]

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 4


Weinberger, Kilian Q., John Blitzer, and
Distance Metric Learning for Large Margin Lawrence K. Saul. "Distance metric
learning for large margin nearest
Nearest Neighbor Classification neighbor classification." Advances in
Neural Information Processing
Systems. 2006.

Kilian Q. Weinberger, John Blitzer and Lawrence K. Saul


Department of Computer and Information Science, University of Pennsylvania
Levine Hall, 3330 Walnut Street, Philadelphia, PA 19104
{kilianw, blitzer, lsaul}@cis.upenn.edu We show how to learn a Mahanalobis distance metric for k-
nearest neighbor (kNN) classification by semidefinite
programming. The metric is trained with the goal that the k-
Abstract nearest neighbors always belong to the same class while
examples from different classes are separated by a large
We show how to learn a Mahanalobis distance metric for k-nearest neigh- margin. On seven data sets of varying size and difficulty, we
bor (kNN) classification by semidefinite programming. The metric is
trained with the goal that the k-nearest neighbors always belong to the
find that metrics trained in this way lead to significant
same class while examples from different classes are separated by a large
Test Image:
improvements in kNN classification—for example, achieving a
margin. On seven data sets of varying size and difficulty, we find that test error rate of 1.3% on the MNIST handwritten digits.
metrics trained in this way lead to significant improvements in kNN
Among 3 nearest neighbors
As in support vector machines (SVMs), the learning problem
classification—for
after but not before training: example, achieving a test error rate of 1.3% on the reduces to a convex optimization based on the hinge loss.
MNIST handwritten digits. As in support vector machines (SVMs), the
Unlike learning in SVMs, however, our framework requires no
Among 3 nearestlearning
neighbors problem reduces to a convex optimization based on the hinge
loss. Unlike learning in SVMs, however, our framework requires no
before but not after training: modification or extension for problems in multiway (as
modification or extension for problems in multiway (as opposed to bi- opposed to bi- nary) classification.
gure 3: Images from the nary)
AT&T classification.
face recognition data base. Top row: an image correctly
cognized by kNN classification (k = 3) with Mahalanobis distances, but not with Eu-
idean distances. Middle row: correct match among the k = 3 nearest neighbors according
Mahalanobis Test Image:but not Euclidean distance. Bottom row: incorrect match among
distance,
1 neighbors
e k = 3 nearest
NearestIntroduction
neighboraccording to Euclidean distance, but not Mahalanobis distance.
after training:
The
Nearest
poken letter before
neighbor neighbors (kNN) rule [3] is one of the oldest and simplest methods for pattern
k-nearest
recognition
training:
he Isolet dataclassification. Nevertheless,
set from UCI Machine Learning it often yieldshas
Repository competitive
6238 examples results,
and and
26 in certain domains,
when cleverly
asses corresponding to letters combined withWe
of the alphabet. prior knowledge,
reduced the inputitdimensionality
has significantly
(orig-advanced the state-of-
the-art
ally at 617) by [1, 14].
projecting theThe
datakNN
ontorule classifies
its leading 172each unlabeled
principal example by the majority label among
components—enough
account forFigure
95% of4:its Top
total row:
variance. Examples
On this data of
set, MNIST
its k-nearest neighbors in the training set. Its performancereport
Dietterich images
and Bakiri whose
thus test nearest
depends neighbor
crucially on thechanges dur-
ror rates ofing
4.2%training.
usingmetric
distance Middle
nonlinear torow:
backpropagation
used identify nearest neighbor
networks
nearest after units
with 26 output
neighbors. training,
(one perusing the Mahalanobis distance
ass) and 3.3% using nonlinear backpropagation networks with a 30-bit error correcting
metric. Bottom row: nearest neighbor before training, using the Euclidean distance metric.
Sebastian
de [5]. LMNN the Raschka
In with absence of prior
energy-based knowledge,
classification most
obtains kNN
a test STAT
errorclassifiers451: IntroEuclidean
use simple
rate of 3.7%. to MLdistances Lecture 2: Nearest Neighbors 5
Journal of Cleaner Production 249 (2020) 119409

Contents lists available at ScienceDirect

Journal of Cleaner Production


journal homepage: www.elsevier.com/locate/jclepro

Remaining useful life estimation of lithium-ion cells based on k-


nearest neighbor regression with differential evolution optimization
Yapeng Zhou a, Miaohua Huang a, *, Michael Pecht b
a
Hubei Key Laboratory of Advanced Technology for Automotive Components, Wuhan University of Technology, Wuhan, 430070, PR China
b
Center for Advanced Life Cycle Engineering, University of Maryland, College Park, MD, 20742, USA

4
a r t i c l e i n f o
Y. Zhou et al. / Journal of Cleaner Production 249 (2020) 119409
a b s t r a c t

Article history:
Received 24 June 2019
Remaining useful life estimation is of great importance to
Remaining useful life estimation is of great importance to customers who use battery-powered products.
This paper develops a remaining useful life estimation model based on k-nearest neighbor regression by
Received in revised form
9 November 2019
customers who use battery-powered products. This paper develops
incorporating data from all the cells in a battery pack. A differential evolution technique is employed to
optimize the parameters in the estimation model. In this approach, remaining useful life is estimated
Accepted 20 November 2019
Available online 22 November 2019 a remaining useful life estimation model based on k-nearest
from a weighted average of the useful life of several nearest cells that share a similar degradation trend to
the cell whose remaining useful life needs to be estimated. The developed method obtains a remaining
Handling editor: Bin Chen
neighbor regression by incorporating data from all the cells in a
useful life estimation result with average error of 9 cycles, and the best estimation only has an error of 2
cycles. All of these estimations are done within 10 ms. Increasing the number of tested cells and nearest
cells improves the estimation accuracy. The developed method reduces the estimation average error by
Keywords:
Lithium-ion cell battery pack. A differential evolution technique is employed to
83.14% and 89.79% compared to particle filter and support vector regression, respectively. Therefore,
Remaining useful life estimation results and comparison validate the effectiveness of the developed method for remaining
K-nearest neighbor regression
Differential evolution
optimize the parameters in the estimation model. In this approach,
useful life estimation of lithium-ion cells.
© 2019 Elsevier Ltd. All rights reserved.
remaining useful life is estimated from a weighted average of the
Fig. 1. Flowchart of parameter optimization and RUL estimation.
useful life of several nearest cells that share a similar degradation
different jellyroll configurations. All the cells have a LiCoO2 cathode

1. Introduction
trend to the cell
and graphite anode, and the electrolyte material contains LiPF6, EC,
and DEC, and the rated voltage is 3.7 V. The cathode and anode
layers of groups A and B are wrapped around orthogonal rotation
center. Cells were charged with constant current and constant
whose remaining useful life needs to be estimated.
end of life (EOL) (Ungurean et al., 2017). Accurate RUL prediction
and SOH diagnosis can provide the battery performance variance
The developed method obtains a remaining useful life estimation
voltage protocol and discharged with constant current to 2.7 V
under 24 ! C. The detailed specifications and charge/discharge

Electric vehicles (EVs) are promising because they are environ- during EVs’ whole service life and can also improve battery man-
method of these cells are shown in Table 1. As shown in Table 1, cells
of group B were discharged with constant current of 0.5C, and a rate
of x C is a current equal of multiplying x and the rated capacity.
mentally friendly and inexpensive to operate. Lithium-ion batteries agement techniques to prolong battery life. For EV applications, EOL
result with
Groups A and B are used to validate the feasibility and online
applicability of this method, respectively.
have become the most popular power source for EVs owing to their
The detailed experiment procedure is as follows:
average error of 9 cycles, and the best estimation only
is the cycle when the battery SOH drops to 80% (Li et al., 2017). After
high energy density, high power density, low self-discharge rate, the SOH drops below 80%, the battery’s capacity and power both
long cycle-life, and no memory effect (Lu et al., has an error of
tend2 to cycles. [...]causing unreliable performance. The
1. Program the charge/discharge with Bits Pro software on
computer.
2013; Panchal et al.,
2. Connect the cells to the circuit, and put them into the thermal
chamber.
drop much faster,
2018). However, some irreversible reactions occur during charging/
3. Turn on the thermal chamber and set the temperature at 24 ! C,
and rest 1 h.
forthcoming battery failure could result in degraded capability,
discharging, such as lithium deposition, electrolyte decomposition,
4. Start the charge/discharge cycling with the Bits Pro software.
5. Terminate the cycling when the SOH reaches 80%. unavailable operation, downtime, and even a catastrophic incident.
Fig. 2. Cell test bench.
and Noteactive material
that there is an interval of 5 minloss
between (Song et al., 2017). The capacity of lithium-
each charge and Therefore, RUL estimation is vital for scheduling battery replace-
discharge. The capacity was calculated by integrating the discharge
data into the computer. The current measuring range of ARBIN ion
currentbatteries
with time, and then accordingly degrades with use. Therefore, the battery
the SOH can be obtained. ment to ensure the safety of the drivers and also leaving enough
BT2000 is ±10 A, with an accuracy of ±5 mA. The temperature Fig. 3 shows the SOH degradation with the charge/discharge
control range of the Yamato DVS402C thermal chamber is 5e260 ! C management
cycle. The EOL of a cell occurssystem (BMS),
when its SOH reaches which
80%. Different
rates cause different cell life and cells discharged with the same C
C is responsible for monitoring time to arrange the second application of the used battery for en-
with an accuracy of ±1 ! C.
Two cell groups have the same chemistry components but battery state of health (SOH) and estimating battery remaining
rate usually have similar degradation trajectories. That’s why cell
ergy storage.
useful life (RUL), has become one of the most important parts of Successful RUL estimation is difficult to achieve because it must
EVs. SOH quantifies a battery’s physical health condition compared take into account current health status, history data, failure
Sebastian Raschka STAT 451: Intro to ML
with that of a fresh battery and is usually calculated by a battery’s
capacity or impedance (Li et al., 2017). In this paper, we define the
Lecture 2: Nearest Neighbors
mechanisms, and failure propagation (Zhang and Lee, 2011).
However, there has been plenty of research focused on lithium-ion
6
biomolecules
Biomolecules 2020, 10, 454 6 of 22
Article
Machine Learning to Identify Flexibility Signatures 2.2. Defining Regions in GPCR Structures for Machine Learning

of Class A GPCR Inhibition


Biomolecules 2020, 10, 454 While ProFlex 16 ofgroups
22 atoms that are flexible (or rigid) according to the natural partitioning of
degrees of freedom in the protein chain following constraint-counting of covalent and non-covalent
The only highly correlated features were H2.1s and H2.2s; in structures where the interactions
N-terminalinsegment
the bond network, machine learning with feature selection requires features that
of helix 2 is separately rigid (in the H2.1s state), the central are consistently defined across the analyzed proteins. A natural feature representation, given the
segment of helix1 also tends to be separately
Joseph Bemister-Buffington 1 , Alex J. Wolf , Sebastian
goal of Raschka
identifying 1,2,
flexibility * and
motifs in theLeslie A. Kuhn
protein associated with1,3, * or inactive states, is to
active
rigid (H2.2s), with a correlation coefficient of 0.78. Seventy percent of the H2.1s and H2.2s occurrences
are in active segment the GPCR structures into small regions (Figure 1), and report the degree of flexibility in each
1 GPCRs. All other feature pairs in Figure 8 have absolute correlation values less than 0.45.
Protein Structural Analysis and Design
Thus, most predictive features behave fairly independently of each Lab,
other, Department
region
while together beingof Biochemistry
following ProFlex
good and Molecular Biology, Michiganand intracellular (ICL)
assessment. Accordingly, the extracellular (ECL)
predictors of anState
active University, 603state. loops and canonical transmembrane
Wilson Road, East Lansing, MI 48824-1319, USA; [email protected] helices (H1-H7) and C-terminal
(J.B.-B.);intracellular helix (H8) were
or inactive GPCR We show that machine learning can pinpoint features distinguishing inactive
numbered sequentially from the N-terminus to C-terminus, and then tabulated for each of the 27 protein
[email protected] (A.J.W.)
2
from active states in proteins, in particular identifying key ligand binding site
structures. Each transmembrane helix was further segmented into three parts: the segment closest to
Department of Statistics, University of Wisconsin-Madison, the extracellularMedical Science
surface (e.g., H1.1 forCenter,
helix 1),1300 University
the most Avenue,
membrane-buried segment (H1.2), and the
Madison, WI 53706, USA
flexibility transitions in GPCRs that are triggered by biologically active ligands.
segment closest to the intracellular surface of the membrane (H1.3). This tripartite segmentation for
transmembrane helices is based on prior observations that the extracellular, interior, and intracellular
3 Department of Computer Science and Engineering, [...] However, considering the flexible versus rigid state identified by graph-
Michigan
segments State University,
of transmembrane segments have 603di↵erent
Wilson Road,
amino acid sequence attributes, and therefore
East Lansing, MI 48824-1319, USA theoretic ProFlex rigidity analysis for each helix and loop segment with the
it can be advantageous for structural predictions to consider the regions separately [19,20]. Figure 1
* Correspondence: [email protected] (S.R.); [email protected] shows the resulting(L.A.K.)
29 segments considered in each GPCR structure (H1.1, H1.2, H1.3, ICL1, etc.) along
ligand removed, followed by feature selection and
with activity switch regions that have been characterized in class A GPCRs (the ionic lock, transmission
k-nearest neighbor classification, was sufficient to identify four segments
switch, and tyrosine toggle; reviewed in [14]). The first extracellular loop in the !GPCRs, preceding H1,
!"#!$%&'(
!"#$%&'
Received: 21 February 2020; Accepted: 11 March 2020; Published: 14 March 2020
was not included in the analysis. Its length and structure vary enormously across GPCRs, and this
surrounding the ligand binding site whose flexibility/rigidity accurately predicts
loop is often removed or altered in protein constructs prior to crystallization or fails to yield reliable
atomic coordinates due to high mobility.
whether a GPCR
Abstract: We show that machine learning can pinpoint features distinguishing isfrom
inactive in anactive
active or inactive state ...
states in proteins, in particular identifying key ligand binding site flexibility transitions in GPCRs that
are triggered by biologically active ligands. Our analysis was performed on the helical segments and
loops in 18 inactive and 9 active class A G protein-coupled receptors (GPCRs). These three-dimensional
(3D) structures were determined in complex with ligands. However, considering the flexible versus
rigid state identified by graph-theoretic ProFlex rigidity analysis for each helix and loop segment with
the ligand removed, followed by feature selection and k-nearest neighbor classification, was sufficient
to identify four segments surrounding the ligand binding site whose flexibility/rigidity accurately
predicts whether a GPCR is in an active or inactive state. GPCRs bound to inhibitors were similar
in their pattern of flexible versus rigid regions, whereas agonist-bound GPCRs were more flexible
and diverse. This new ligand-proximal flexibility signature of GPCR activity was identified without
knowledge of the ligand binding mode or previously defined switch regions, while being adjacent
Figure 1. Class A GPCR architecture, partitioned into segments for machine learning analysis.
Figureto
7. the known
The four transmission
GPCR regions whose flexibilityswitch. Following
allows the most discriminationthis proof
between of concept, the ProFlex flexibility analysis
active
Extracellular loops are labeled ECL1, ECL2, and ECL3 from N-terminus to C-terminus, and the
and inactive structures are highlighted in yellow; the remainder of the largest rigid region in human
coupled
2-adrenergic with
receptor (PDBpattern
entry 2RH1) recognition
appears in red, withand activity
two separately rigidclassification may
intracellular loops are
regions in green beICL1,
labeled useful forICL3.
ICL2, and predicting whether
Each transmembrane helix is divided into three
segments, extracellular, interior, and intracellular, and indexed first by the helix number, e.g., H1,
newly
and light designed
blue ribbons (based onligands Joe
the data Bemister-Buffington,
inbehave
Figure as activators
2). The H2.2, ECL1, H3.1, orand H5.1Alex
inhibitors
segmentsJ.
and then by the
Wolf,of Sebastian
in segment
protein families
helix
Raschka,
in general,
from N-terminus based
to C-terminus.
and
Foron
Leslie
the
instance,
A. Kuhn (2020)
H1.2 is the second
colocalize around the ligand site, which in this case hosts the blood pressure-reducing beta-blocker,
pattern
carazolol. of flexibility
The extracellular Machine
side of the they
GPCR at the top.Learning
isinduce in to Identify
thein protein.
Trends of these Flexibility
(interior) segment
flexibility/rigidity four Signatures
of helix 1. Helix of Class
8, which is intracellular and shorter,A
wasGPCR
divided into Inhibition
two segments.
Previously characterized activity switch regions and their key amino acid residues in GPCRs—the ionic
regions between active and inactive structures across all 27 GPCRs are annotated.
Biomolecules 2020, 10, lock,
454. transmission switch, and tyrosine toggle—are also annotated [14]. The residues shown are those
Keywords: GPCR activity determinants; flexibility foundanalysis; coupled
in human CXCR4 (PDB entry residues;
3ODU). allostery; ProFlex;
MLxtend; feature selection; pattern classification

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 7


1-Nearest Neighbor

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors


1-Nearest Neighbor

Task: predict the target / label of a new data point

? ?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 9


1-Nearest Neighbor

Task: predict the target / label of a new data point

? ?

How? Look at most "similar" data point in training set

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 10


1-Nearest Neighbor Training Step

[i] [i]
⟨x , y ⟩ ∈ 𝒟 ( | 𝒟 | = n)

How do we "train" the 1-NN model?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 11


1-Nearest Neighbor Training Step

[i] [i]
⟨x , y ⟩ ∈ 𝒟 ( | 𝒟 | = n)

To train the 1-NN model, we simply "remember" the


training dataset

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 12


1-Nearest Neighbor Prediction Step
[i] [i]
Given: ⟨x , y ⟩ ∈ 𝒟 ( | 𝒟 | = n)
[q]
⟨x , ???⟩
[q]
Predict: f(x )
closest_point := None
Algorithm:
query point
closest_distance :=

for :
current_distance :=
if current_distance < closest_distance:
closest_distance := current_distance
closest_point :=
return closest_point
Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 13
Commonly used: Euclidean Distance (L2)

∑( j
xj )
2
[a] [b] [a] [b]
d(x , x ) = x −
j=1

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 14


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 15
Nearest Neighbor Decision Boundary

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 16


Decision Boundary Between (a) and (b)

a a

How does it look like?


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 17
Decision Boundary Between (a) and (c)

a c

How does it look like?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 18


a a

Decision Boundary Between (a) and (c)

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 19


Decision Boundary of 1-NN

a a c

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 20


Decision Boundary of 1-NN

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 21


Which Point is Closest to ?

Euclid
distan

a
c
?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 22


Depends on the Distance Measure!

Euclidean
distance=1 Manhattan
distance=1

a
c
? ?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 23


Some Common Continuous Distance Measures

Euclidean

Manhattan
1
m p

[ j=1 ( ) ]
p
Minkowski: d(x[a], x[b]) = [a] [b]

xj
− xj

Mahalanobis

Cosine similarity
...
Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 24
Some Discrete Distance Measures

m
where
Hamming distance: [a] [b]
d(x , x ) =

xj[a] − xj[b]
xj ∈ {0,1}
j=1

Jaccard/Tanimoto similarity:
|A ∩ B| |A ∩ B|
J(A, B) = |A ∪ B|
= |A| + |B| − |A ∩ B|

2|A ∩ B|
Dice: D(A, B) = |A| + |B|

...
Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 25
Feature Scaling

Euclidean Euclidean
distance=1 distance=1

c c

? ?
a a b
b

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 26


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 27
k-Nearest Neighbors

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 28


A
y:

Majority
Majorityvote:
vote:
Plurality
Purality Vote:
vote:

B
y:
Majority
Majorityvote:
vote: None
Plurality
Purality Vote:
vote:

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 29


kNN for Classification

𝒟k = {⟨x , f(x )⟩, …, ⟨x , f(x )⟩}


[1] [1] [k] [k]
𝒟k ⊆ 𝒟
k
[q] [i]

h(x ) = arg max δ(y, f(x ))
y∈{1,...,t}
i=1
{0, if a ≠ b .
1, if a = b,
δ(a, b) =

h(x ) = mode({f(x ), …, f(x )})


[t] [1] [k]

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 30


kNN for Regression

𝒟k = {⟨x , f(x )⟩, …, ⟨x , f(x )⟩}


[1] [1] [k] [k]
𝒟k ⊆ 𝒟

k
1
h(x ) =
[t]
f (x )
[i]
k∑i=1

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 31


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime


complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 32
complexity field of research in computer science, we w
his course. However, you should at leat be familar with
t for the study of machineBig-O
learning algorithms.

f(n) Name
1 Constant
log n Logarithmic
n Linear
n log n Log Linear
2
n Quadratic
3
n Cubic
c
n Higher-level polynomial
n
2 Exponential

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 33


s used in both mathematics and computer science to study the asymp-
tions, i.e., the asymptotic upper bounds. In the context of algorithms
it is most commonly use to measure the time complexity or runtime
the worst case scenario. (Often, it is also used to measure memory

Big-O
mplexity field of research in computer science, we will not go into too
ourse. However, you should at leat be familar with the basic concepts,
the study of machine learning algorithms.

f(n) Name
1 Constant
log n Logarithmic
n Linear
n log n Log Linear
n2 Quadratic
n3 Cubic
nc Higher-level polynomial
2n Exponential

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 34


Big-O Example 1

2
f(x) = 14x − 10x + 25

𝒪( )

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 35


Big-O Example 2

f(x) = (2x + 8)log2(x + 9)

𝒪( )

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 36


Big-O Example 2

f(x) = (2x + 8)log2(x + 9)

Why don't we have to distinguish between different logarithms?

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 37


Big-O Example 3
In [16]:

A = [[1, 2, 3],
[2, 3, 4]]

B = [[5, 8],
[6, 9],
𝒪( )
[7, 10]]

def matrixmultiply (A, B):

C = [[0 for row in range(len(A))]


for col in range(len(B[0]))]

for row_a in range(len(A)):


for col_b in range(len(B[0])):
for col_a in range(len(A[0])):
C[row_a][col_b] += \
A[row_a][col_a] * B[col_a][col_b]
return C

matrixmultiply(A, B)

Out[16]:

[[38, 56], [56, 83]]

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 38


Big O of kNN

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 39


Improving Computational Performance

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 40


2.7.1 Naive k NN Algorithm in Pseudocode

Naive Nearest
Below are two naive
of a query point x .
Neighbor
approaches (Variant
[q]
A and Variant B) Search
for finding the k nea

Variant A
Dk := {} 𝒪( )
while |Dk | < k:

• closest distance := 1
• for i = 1, ..., n, 8i 2
/ Dk :

– current distance := d(x[i] , x[q] )


– if current distance < closest distance:
⇤ closest distance := current distance
⇤ closest point := x[i]

• add closest point to Dk


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 41
⇤ closest point := x

Naive Nearest
• add closest point to D Neighbor Search
k

Variant B
Dk := D 𝒪( )
while |Dk | > k:

• largest distance := 0
• for i = 1, ..., n 8i 2 Dk :

– current distance := d(x[i] , x[q] )


– if current distance > largest distance:
Sebastian Raschka STAT479 FS18. L01: Intro to Machine Learning
⇤ largest distance := current distance
⇤ farthest point := x[i]
• remove farthest point from Dk

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 42


Naive Nearest Neighbor Search

Using a priority queue O( ____ )

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 43


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 44
Improving Computational Performance

Data Structures

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 45


Improving Computational Performance

Dimensionality Reduction

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 46


Improving Computational Performance

Editing / "Pruning"

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 47


Improving Computational Performance
k NN, we permanently remove data points that do not a↵ect the decision
ple, consider a single data point (aka “outlier”) surrounded by many d
↵erent class. If we perform a k NN prediction, this single data poin
Editing
the class label prediction / "Pruning"
in plurality voting; hence, we can safely remo

Illustration of k NN editing, where we can remove points from the training


nce the predictions. For example, consider a 3-NN model. On the left, the
n dashed lines would not a↵ect the decision boundary as “outliers.” Similar
Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 48
Improving Computational Performance

Prototypes

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 49


Improving Predictive Performance

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 50


Hyperparameters

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 51


Hyperparameters

• Value of k
• Scaling of the feature axes
• Distance measure
• Weighting of the distance measure

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 52


k ∈ {1,3,7}
k=_ k=_

k=_

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 53


Feature-Weighting via Euclidean Distance

m
wj(xj[a] − xj[b])
2
dw(x[a], x[b]) =

j=1

As a dot product:

[a] [b] [a] [b] m


c=x −x , (c, x , x ∈ℝ )
[a] [b] ⊤
d(x , x ) = c c
[a] [b] ⊤
dw(x , x ) = c Wc,
W ∈ ℝm×m = diag(w1, w2, . . . , wm)
Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 54
Distance-weighted kNN
k
[t] [i] [i]
j∈{1,...,p} ∑
h(x ) = arg max w δ( j, f(x ))
i=1

[i] 1
w =
d(x , x )
[i] [t] 2

Small constant to avoid zero division


or set h(x) = f(x)

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 55


Lecture 2 (Nearest Neighbors)
Topics
1. Intro to nearest neighbor models

2. Nearest neighbor decision boundary

3. K-nearest neighbors

4. Big-O & k-nearest neighbors runtime complexity

5. Improving k-nearest neighbors: modifications


and hyperparameters

6. K-nearest neighbors in Python


Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 56
kNN in Python

DEMO

Sebastian Raschka STAT 451: Intro to ML Lecture 2: Nearest Neighbors 57

You might also like