07 Supervised Machine Learning
07 Supervised Machine Learning
Supervised
Machine
Learning, 4
Dr. Waad Alhoshan
[email protected]
Department of Computer Science — IMSIU
Previous Lectures
• Define classification as ML problem
• Simple intro the three classification approaches;
• Non-probabilistic approach discriminant functions
• Probabilistic approaches such as generative models and discriminative
models.
Finding the optimal value of x (green line) , because this is the decision
boundary giving the minimum probability of misclassification.
• In regression, we considered models that were linear functions of the parameters, and we saw
that the minimization of a sum-of-squares error function led to a simple closed-form solution for
the parameter values (or weights).
• The least-squares solution can also be used to solve classification problems by attempting to find
the optimal decision boundary.
Each class has its own discriminant function, considering simple generalized linear
model as follows:
• classes
• “how far from the decision boundary”
• For all classes, a shorter notation would be:
• Where is a matrix contains columns, where
• And is the vector of input features as
• And is the output vector for the discriminant functions of all classes.
• Finally, input will be assigned to Class with the maximum output predicator
If we applied the previous function, we would get different values (continued values) of , for each
class. This won’t help in the classification problem! So, we want to predicate 0 and 1
Basically, transform the output into one-hot encoding of binary values 0 and 1. The goal is to get
one-hot encoding to represent output from classes
FYI .. “One Hot Encoding is used to convert
numerical categorical variables into binary vectors”
Example:
()
𝑡𝑇 𝑇
1
𝑖𝑓 𝑘=5 , 𝑎𝑛𝑑 𝑗 = 1 ∈ 𝑛∴ 𝑡 𝑗 =¿
⋮
𝑇 =¿ ⋮
We want to get 0 for class if the
𝑡𝑇𝑛
predicate value is not in , and 1
otherwise!
~𝑇
𝑦 𝐿𝑆 ( 𝑥 ) =𝑊 𝐿𝑆 ~
𝑥
Remember our
interpretations of
It’s the distance from the
decision boundary
• Input dataset: it is more convenient to use target values t = +1 for class C1 and
• Target: binary classes t = −1 for class C2, which matches the choice of activation
function.
• Classification decision :
From predication function Discriminant function
if and if
Cont., C2 = +1
• There might be many solutions depending on the initialization of and on the order in which data
is presented in SGD
• If dataset is not linearly separable, the perceptron algorithm will not converge.