0% found this document useful (0 votes)
4 views

Chapter 07 SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 07 SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Unit-III (Part-b)

(Support Vector Machine)

Dr. Anusha Nalajala

1
Introduction
• A Support Vector Machine (SVM) is a supervised learning model widely used for
classification and regression tasks.
• Even though it is applied for regression tasks, best suits for classification.
• The objective of SVM is to identify a best decision boundary that can classify the
data points present in n-dimensional feature space.

2
Introduction
• There can be multiple decision boundaries to classify the data points, but we
need to find out the best decision boundary that helps to classify the data
points accurately.
• This best boundary is known as Hyperplane of SVM
• Decision boundary:
– 2D feature space: straight line
– 3D : Plane
– N-D: Hyperplane

3
Hyperplane of SVM

• Support Vectors: Data points which


are present near the identified set of
lines (hyperplane).
• Hyperplane: Best decision boundary
that helps to classify the data points
accurately
• Margin: The distance between the
hyperplane and the nearest
datapoints.

4
How does it work

• Thumb rule to identify the right hyperplane


– Select the hyperplane which segregates the two classes in a better way.
– Maximizing the distance between the nearest data points and the hyperplane.
– This maximum distance is called Margin.

5
Mathematical Intitution
• The equation for straight line separating two classes in 2D feature space is
y=mx+c
• But in SVM, we represent this line using a different, more general form, which
works in any number of dimensions:
f(x)=w x+b=0
– where: w is the weight vector (or direction) perpendicular to the hyperplane,
– x is the input vector (a data point),
– b is the bias term (Intercept).

6
Classification Rule
• The SVM decides which side of the line each data point is on based on the sign
of f(x):

7
The Margin: Making Space Between the Classes
• The key to SVM is maximizing the margin — the distance between the
boundary line and the closest data points (called support vectors) in each class.
• The margin width M is defined as:

– 2: Two parallel boundaries


– ∥𝑤∥: Magnitude/length of the vector

8
Constraints
• For each data point ( , ) where is the label (+1 or −1), we want
it to be correctly classified using a maximum margin distance away
from the boundary. This leads to the following constraints:

9
The Primal Optimization Problem
Now, we have two goals:
1. Maximize the margin (equivalently, minimize w ),
2. Satisfy the constraint for each data point.
• The primal form of the SVM optimization problem can be written as:

• Maximum margin with noise


– Noise: Mislabeled points, outliers, or overlapping data
– A strict margin might not be possible always
– SVM handles this by using a soft margin

10
Soft Margin
• Achieved by introducing slack variables ξi one for each data point.
• Measures how much each data point violates the margin.
• If ξi=0, the point is correctly classified and outside the margin.
• If ξi>0, the point is either inside the margin or misclassified.
• Optimization Problem for Soft Margin SVM:
– The goal now becomes to maximize the margin while minimizing the total error from the
points that violate the margin.
– The optimization problem is formulated as:

– C is a parameter that controls the trade-off between maximizing the margin and allowing
some violations. (Large C: less tolerance of errors, Small C: High tolerance of errors)
11
The Primal Optimization Problem
• The primal form is straightforward (ensuring that each data point is classified
on to the right side) but has limitations.
– Complicated if there are lots of data points.
– Cannot be applied to the data that cannot be linearly separable.

12
Linear and Non Linear separable data
• Datapoints can be classified using • Datapoints cannot be classified
single (line/plane) using single hyperplane.

13
The Dual Formulation: Alternative Approach
• The dual approach, focuses on relationships between data points
rather than calculating the boundary directly by using w and b.
• The relationship is established by using Lagrange multipliers to
each point.
• The Lagrangian for the original problem (primal form) is:

14
The Dual Formulation: Alternative Approach
• Maximizing the Lagrangian with respect to and leads to the
dual problem:

• Sum of all : Adding up the multipliers (weights) for each point


• Double sum with : This part calculates the relationships
between pairs of data points and along with their class labels
y and y .

15
The Dual Formulation: Alternative Approach
• If we consider only the support vectors, that get non-zero values,
then the decision function is:

• The dual function is designed to balance:


– Maximizing for points close to the boundary (support vectors), which
influence the boundary.
– Minimizing the influence of points far from the boundary by setting their
=0.

16
Non-Linear SVM: Kernel Trick

17
Non-Linear SVM: Kernel Trick
• If the data isn’t linearly separable (like in a circular pattern), SVM uses the
kernel trick to map data into a higher-dimensional space where it can be
linearly separated. (Used for solving regression tasks too)
• The kernel trick uses a function K(xi, xj) that computes the dot product in this
high-dimensional space.
• The common kernels are

18
Logistic Regression
• Statistical method used for binary classification
• Logistic regression uses a logistic function (also called the sigmoid function) to
map the output to a range between 0 and 1.
• The sigmoid function to transform the linear output into a probability is

• The output of the logistic function is between 0 and 1


19
Logistic Regression

20

You might also like