06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
Distances (2):
Wrapping-up LwP, Nearest
Neighbors
CSN-382
2
LwP: The Prediction Rule, Mathematically
What does the prediction rule for LwP look like
mathematically?
Test example
For Binary Classification, can treat as the “score” of input and threshold
to get binary label
Recall that the LwP model
can also be seen as a linear
model (although it wasn’t
formulated like this)
Don’t worry. Can easily fold-in
the bias term here as shown in
the figure below
Can append a constant
Wait – when
feature “1” for each input
discussing LwP,
and rewrite as where now
wasn’t the linear
both and
model of the
form ? Where did
We will assume the same and
the “bias” term
omit the explicit bias for
go?
6
Multi-class Classification
Multi-class Classification: A classification task with more than two classes;
e.g., classify a set of images of fruits which may be oranges, apples, or pears.
Multi-class classification makes the assumption that each sample is assigned
to one and only one label: a fruit can be either an apple or a pear but not both
at the same time.
• Binary Classification: Classification tasks with two classes.
• Multi-class Classification: Classification tasks with more than two classes.
• Some algorithms are designed for binary classification problems.
• Examples include:
• Logistic Regression
• Perceptron
• Support Vector Machines
7
8
One-Vs-Rest for Multi-Class Classification
One-vs-rest (OvR for short, also referred to as One-vs-All or OvA) is a heuristic method for
using binary classification algorithms for multi-class classification.
• It involves splitting the multi-class dataset into multiple binary classification problems. A
binary classifier is then trained on each binary classification problem and predictions are
made using the model that is the most confident.
• For example, given a multi-class classification problem with examples for each class ‘red,’
‘blue,’ and ‘green‘. This could be divided into three binary classification datasets as follows:
• Binary Classification Problem 1: red vs [blue, green]
• Binary Classification Problem 2: blue vs [red, green]
• Binary Classification Problem 3: green vs [red, blue]
Geometric
Margin
𝜇+¿¿ 𝑑𝑤 ( 𝒂 , 𝒃 ) = ∑ 𝑤 𝑖 ( 𝑎 𝑖 −𝑏𝑖 )
2
𝜇− 𝑖 =1
Use a smaller for the
horizontal axis feature in
this example
Supervised Learning
using
Nearest
Neighbors
24
Nearest Neighbors
Another supervised learning technique based on computing
Wait. Did you say distance
distances from ALL the training
points? That’s gonna be
sooooo expensive!
Very simple idea. Simply do the following at test time
Yes, but let’s not
Compute distance of of the test point from all the training
worry aboutpoints
that at
the moment. There
Sort the distances to find the “nearest” input(s) in training
are ways todata
speed
Predict the label using majority or avg label of theseupinputs
this step
Nearest neighbour
Test point approach induces a
Test point
Voronoi
tessellation/partition of
the input space (all test
points falling in a cell will get
the label of the training input
27
K Nearest Neighbors (KNN)
In many cases, it helps to look at not one but > 1 nearest
neighbors
Test input = 31
How to pick
the “right” K
value?
K is this model’s
“hyperparameter”.
One way to choose it
A hyperparameter is using “cross-
is a parameter validation” (will see
whose value is shortly)
used to control Also, K should ideally
the learning be an odd number to
process. avoid ties
Essentially, taking more votes helps!
Also leads to smoother decision boundaries (less chances of
overfitting on training data)
28
-Ball Nearest Neighbors (-NN)
Rather than looking at a fixed number of neighbors, can look
So changing may
inside a ball of a given radius , around the test input
change the
prediction. How to
pick the “right”
value?
Test input
Tagging/multi-label learning