100% found this document useful (1 vote)

16 views81 pages

Unit 2 PPT - Part 2

The document discusses the EM algorithm, which estimates joint probability distributions in datasets with missing data, and its applications in various fields such as data clustering and natural language processing. It also covers Support Vector Machines (SVM), a supervised learning model used for classification and regression, detailing its mechanism for finding optimal hyperplanes and the significance of support vectors. Additionally, the document explains kernel functions in SVM, their types, advantages, and disadvantages, as well as the concept of non-linear SVMs for handling complex datasets.

Uploaded by

Aman Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

16 views81 pages

Unit 2 PPT - Part 2

Uploaded by

Aman Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Machine Learning

(Unit 2 - Part 2)
In statistic modeling a common problem arises as to how can we
estimate the joint probability distribution for dataset.
What is EM Algorithm?
• EM algorithm was proposed in 1997 by Arthur Dempster.
• It is basically used to find the local maximum likelihood
parameters of a statistical model in case of latent variables
are present for the data is missing or incomplete.
Applications of EM
Algorithm
Data Clustering in Machine Learning and Computer Vision
Used in Natural Language Processing
Used in Parameter Estimation in Mix Models and
Quantitative Genetics
Used in Psychometrics
Used in Medical Image Reconstruction, Structural Engineering
Support Vector Machine
Support vector Machine (SVM)
❑ SVM is based on statistical learning theory.
❑ A support-vector machines are supervised learning models with associated learning
algorithms that analyze data used for classification and regression analysis.
❑ SVM involve detection of hyperplanes which segregate data into classes.
❑ Support vectors are the data points that lie closest to the decision surface (or
hyperplane).
❑ SVMs are very versatile and are also capable of performing linear or nonlinear
classification, regression, and outlier detection.
Two Class Problem: Linear Separable Case

❑ Linearly separable
Class 1
binary sets
Denotes +1
Denotes -1
❑ Many decision
boundaries can
separate these two
Class 2 classes.
Which one should we
choose?
Classifier Margin

Denotes +1
Denotes -1
Define the margin of a
linear classifier as the width
that the boundary could be
increased by before hitting
a data point.
Good Decision Boundary: Margin Should Be Large

f(x,w,b) = sign(w. x - b)
Denotes +1
Denotes -1 The maximum margin linear
classifier is the linear classifier
with the maximum margin.
This is the simplest kind of
SVM (called an Linear SVM).

Support Vectors
are those data
points that the
margin pushes
up against
How Does it Works
Identify the right hyper-plane (Scenario-1):

Thumb rule to identify the

right hyper-plane: “Select
the hyper-plane which
segregates the two classes
better”.

In this scenario,
hyper-plane “B”
has excellently performed
this job.
How Does it Works
Identify the right hyper-plane (Scenario-2):

Maximizing the distances

between nearest data
point (either class) and
hyper-plane will help us to
decide the right
hyper-plane.

This distance is called

as Margin.
How Does it Works
Identify the right hyper-plane (Scenario-2):

The margin for hyper-plane

C is high as compared to
both A and B.

Hence, we name the right

hyper-plane as C.

Another lightning reason

for selecting the
hyper-plane with higher
margin is robustness.
How Does it Works
Identify the right hyper-plane (Scenario-3):

SVM selects the

hyper-plane which
classifies the classes
accurately prior
to maximizing margin.

Here, hyper-plane B has a

classification error and A
has classified all correctly.

Therefore, the right

hyper-plane is A.
How Does it Works
Identify the right hyper-plane (Scenario-3):

SVM selects the

hyper-plane which
classifies the classes
accurately prior
to maximizing margin.

Here, hyper-plane B has a

classification error and A
has classified all correctly.

Therefore, the right

hyper-plane is A.
How Does it Works
Can we classify two classes (Scenario-4)?

The SVM algorithm has a feature to ignore outliers

and find the hyper-plane that has the maximum
margin.

Hence, we can say, SVM classification is robust to

outliers.
How Does it Works
Find the hyper-plane to segregate to classes (Scenario-5)

All values for z would be

positive always because z is
the squared sum of both x
and y

In the original plot, red circles

appear close to the origin of x and
It solves this problem by introducing additional y axes, leading to lower value of z
feature. and star relatively away from the
origin result to higher value of z.
Here, we will add a new feature z=x^2+y^2.

Now, let’s plot the data points on axis x and z

What is SVM?
Support vector machines so called as SVM is a supervised learning algorithm

Can be used for classification and regression problems as support vector classification
(SVC) and support vector regression (SVR).

It is used for smaller dataset as it takes too long to process.

The ideology behind SVM
SVM is based on the idea of finding a hyperplane

that best separates the features into different domains

Intuition development
There is a stalker who is sending you emails and now you want to design a function(
hyperplane ) which will clearly differentiate the two cases, such that whenever you
received an email from the stalker it will be classified as a spam. The following are the
figure of two cases in which the hyperplane are drawn, which one will you pick and why?
Terminologies used in SVM
The points closest to the hyperplane are called as the support vector points and

The distance of the vectors from the hyperplane are called the margins.

SV points are very critical in determining the

hyperplane because if the position of the
vectors changes the hyperplane’s position is
altered.

Technically this hyperplane can also be called as

margin maximizing hyperplane.
Hyperplane (Decision surface)
The hyperplane is a function which is used to differentiate between features.

In 2-D, the function used to classify between features is a line, whereas

The function used to classify the features in a 3-D is called as a plane

Similarly, the function which classifies the point in higher dimension is called as a
hyperplane.
Hyperplane (Decision surface)
Let’s say there are “m” dimensions

thus the equation of the hyperplane in the ‘M’ dimension can be given as =

where,
Wi = vectors(W0, W1, W2, W3……Wm)
b = biased term (W0)
X = variables.
Hard margin SVM
Assume 3 hyperplanes namely (π, π+, π−) such that ‘π+’ is parallel to ‘π’ passing through
the support vectors on the positive side and ‘π−’ is parallel to ‘π’ passing through the
support vectors on the negative side.

the equations of each hyperplane can be considered as:

Hard margin SVM
for the point X1 :

for the point X3 :

for the point X4 :

for the point X6 :

Hard margin SVM
Let’s look into the constraints which are not classified:

So we can see that if the points are linearly separable then only our hyperplane is able to
distinguish between them and if any outlier is introduced then it is not able to separate
them.

So these type of SVM is called as hard margin SVM

Support Vector Kernels
❑ The linear classifier relies on an inner product between vectors
K(xi,xj)=xiTxj
❑ If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
❑ A kernel function is some function that corresponds to an inner product in some
expanded feature space.
Why use kernels?
Make non-separable problem separable.
Map data into better representational space
SVM Kernel Functions
❑ SVM algorithms use a set of mathematical functions that are defined as the
kernel. The function of kernel is to take data as input and transform it into the
required form.
❑ Different SVM algorithms use different types of kernel functions. These
functions can be different types. For example linear, nonlinear, polynomial,
radial basis function (RBF), and sigmoid.
❑ The kernel functions return the inner product between two points in a suitable
feature space. Thus by defining a notion of similarity, with little computational
cost even in very high-dimensional spaces.
Support Vector Kernel
Types of kernels:
Linear Kernel

Polynomial Kernel

Radial Basis Function Kernel (RBF) / Gaussian Kernel

.
Kernel Functions
❑ Linear Kernel: K(X,Y)=XTY + c

❑ Polynomial kernel: K(X,Y)=(γ⋅XTY+r)d,γ>0

❑ Radial basis function (RBF) Kernel: K(X,Y)=exp(∥X−Y∥2/2σ2) which in

simple form can be written as exp(−γ⋅∥X−Y∥2),γ>0
Data representation using kernels
Pros of SVM
It is really effective in the higher dimension.

Effective when the number of features are more than training examples.

Best algorithm when classes are separable

The hyperplane is affected by only the support vectors thus outliers have less impact.

SVM is suited for extreme case binary classification.

Cons of SVM
For larger dataset, it requires a large amount of time to process.

Does not perform well in case of overlapped classes.

Selecting, appropriately hyperparameters of the SVM that will allow for sufficient
generalization performance.

Selecting the appropriate kernel function can be tricky.

Applications
Definition
❑ Margin of Separation (d): the separation between the hyperplane and the closest
data point for a given weight vector w and bias b.
❑ Optimal Hyperplane (maximal margin): the particular hyperplane for which the
margin of separation d is maximized.
❑ Thus, this can be written as : H1
H
T
w xi + b ≥ 0 for di = +1 H2
wT xi + b < 0 for di = –1 d+
d-
Contents
❑ Non- Linear SVM
▪ Non- Linear SVM : Feature Space
▪ Transformation to Feature Space
❑ SVM kernel functions
❑ Applications
Non-Linear SVM

❑ The idea is to gain linearly separation by mapping the data to a higher

dimensional space.
❑ Datasets that are linearly separable (with some noise) work out great

x
❑ But what are we going to do if the dataset is just too hard?
0

❑ How about … mapping data to a higher-dimensional

x space
x2

0 x
Non-Linear SVM : Feature Space
❑ General idea: the original input space (x) can be mapped to some
higher-dimensional feature space (φ(x) )where the training set is separable:
x=(x1,x2) √2x1x2

Φ: x → φ(x)

x22

x12
Transformation to Feature Space
❑ Possible problem of the transformation
❑ High computation burden due to high-dimensionality and hard to get a good
estimate
❑ SVM solves these two issues simultaneously
❑ “Kernel tricks” for efficient computation
❑ Minimize ||w||2 can lead to a “good” classifier
φ( φ(
φ( ) φ( ) φ( φ(
φ )
φ( φ( ) φ( ) φ( ) φ(
) φ( ) φ( ) φ(φ( ) )
(.) ) φ(
) φ( )
) φ( )
)
Feature
Input )
space space space
Key idea: transform xi to a higher dimensional
How to calculate the distance from a point to a line?
Form of equation defining the decision
surface separating the classes is a
hyperplane of the form:
wx +b = 0
W x X – Vector
W – Normal Vector
b – bias

What is the distance expression for a

point x to a line wx+b= 0?
Thank
You

NZ National Vital Signs Chart
No ratings yet
NZ National Vital Signs Chart
2 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
SVM Guide for Data Science Students
No ratings yet
SVM Guide for Data Science Students
19 pages
SVM (Repaired)
No ratings yet
SVM (Repaired)
39 pages
SVM
No ratings yet
SVM
11 pages
SVMs
No ratings yet
SVMs
30 pages
SVMs
No ratings yet
SVMs
30 pages
Support Vector Machine
100% (1)
Support Vector Machine
40 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Support Vector Machine Algorithm
No ratings yet
Support Vector Machine Algorithm
8 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
28 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
27 pages
DMML Unit4 - SVM
No ratings yet
DMML Unit4 - SVM
50 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Support Vector Machine Guide
No ratings yet
Support Vector Machine Guide
21 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Classification Regression: Mostly Used in Classification Problems
No ratings yet
Classification Regression: Mostly Used in Classification Problems
8 pages
Third Year Engineering: Unit II: Supervised Machine Learning
No ratings yet
Third Year Engineering: Unit II: Supervised Machine Learning
11 pages
SVM Notes
No ratings yet
SVM Notes
4 pages
SVM
No ratings yet
SVM
43 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
11 pages
Unit 2 SVM
No ratings yet
Unit 2 SVM
16 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
18 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Module 3 ML 24
No ratings yet
Module 3 ML 24
65 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
SVM Basics for Data Scientists
No ratings yet
SVM Basics for Data Scientists
139 pages
SVM: High Accuracy Classifier Guide
No ratings yet
SVM: High Accuracy Classifier Guide
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
SVM Basics for Data Scientists
No ratings yet
SVM Basics for Data Scientists
28 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
SVM
No ratings yet
SVM
12 pages
UNIT-III Support Vector Machines
No ratings yet
UNIT-III Support Vector Machines
43 pages
Support Vector Machines (SVMS) - Introduction and Key Concepts
No ratings yet
Support Vector Machines (SVMS) - Introduction and Key Concepts
52 pages
Ankita
No ratings yet
Ankita
10 pages
Support Vector Machine
No ratings yet
Support Vector Machine
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
11 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
7 - Support Vector Machines (SVM)
No ratings yet
7 - Support Vector Machines (SVM)
29 pages
Final Unit-4 (Part-A) HTCS-601, Topic (Personnel Security Practices and Procedures) by Updesh Given To AKG Students
No ratings yet
Final Unit-4 (Part-A) HTCS-601, Topic (Personnel Security Practices and Procedures) by Updesh Given To AKG Students
20 pages
Final Unit-4 (Part-B) HTCS-601, Topic (Auditing and Monitoring) by Updesh Jaiswal Given To AKG Students
No ratings yet
Final Unit-4 (Part-B) HTCS-601, Topic (Auditing and Monitoring) by Updesh Jaiswal Given To AKG Students
21 pages
Magicpin Invoice 3
No ratings yet
Magicpin Invoice 3
1 page
Final Unit-3 (Part-A) HTCS-601, Topic (Directives and Procedures For Security Policy Planning) by Updesh Given To AKG Students in 2025
No ratings yet
Final Unit-3 (Part-A) HTCS-601, Topic (Directives and Procedures For Security Policy Planning) by Updesh Given To AKG Students in 2025
28 pages
Final Unit-2 (Part-B) HTCS-601, Topic (Concepts of Risk Management) by Updesh Jaiswal Given To Studetnts in 2025
No ratings yet
Final Unit-2 (Part-B) HTCS-601, Topic (Concepts of Risk Management) by Updesh Jaiswal Given To Studetnts in 2025
15 pages
Final Unit-1 (Part-B) HTCS-601, Topic (Security of Information Assets) by Updesh Jaiswal
No ratings yet
Final Unit-1 (Part-B) HTCS-601, Topic (Security of Information Assets) by Updesh Jaiswal
61 pages
Final Unit-2 (Part-A) HTCS-601, Topic (Threats To and Vulnerabilities of Systems) by Updesh Jaiswal Given To Studetnts in 2025
No ratings yet
Final Unit-2 (Part-A) HTCS-601, Topic (Threats To and Vulnerabilities of Systems) by Updesh Jaiswal Given To Studetnts in 2025
39 pages
Saurabh
No ratings yet
Saurabh
22 pages
JEE Main 2022 Score Report
No ratings yet
JEE Main 2022 Score Report
1 page
WinDNC V06 02 NewFeatures en
100% (3)
WinDNC V06 02 NewFeatures en
2 pages
Unit 11
No ratings yet
Unit 11
6 pages
11 Ergonomics in Osh
No ratings yet
11 Ergonomics in Osh
9 pages
BLF24 T ST en GB
No ratings yet
BLF24 T ST en GB
4 pages
Design & Implement Trash Rack Cleaning System
No ratings yet
Design & Implement Trash Rack Cleaning System
23 pages
Old Man Yells at Cloud Know Your Meme
No ratings yet
Old Man Yells at Cloud Know Your Meme
1 page
Splendor Plus
No ratings yet
Splendor Plus
1 page
STID1103 SYLLABUS A211 Student
No ratings yet
STID1103 SYLLABUS A211 Student
5 pages
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
No ratings yet
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
2 pages
Asuhan Keperawatan Diare
No ratings yet
Asuhan Keperawatan Diare
32 pages
MCB Types
No ratings yet
MCB Types
3 pages
RPMS COT Sheets
No ratings yet
RPMS COT Sheets
12 pages
Soalan KBAT Biologi 2015: Organisasi Sel
No ratings yet
Soalan KBAT Biologi 2015: Organisasi Sel
5 pages
MobiSTOP Ultima 02242 R8 EN PDF
No ratings yet
MobiSTOP Ultima 02242 R8 EN PDF
1 page
18 Amazon Rally-1
No ratings yet
18 Amazon Rally-1
11 pages
ECommerce Virtual Assistant Course
100% (1)
ECommerce Virtual Assistant Course
18 pages
UPI Transactiosn Frauds in India
No ratings yet
UPI Transactiosn Frauds in India
4 pages
MH 400
No ratings yet
MH 400
81 pages
Adani Group Acquires NDTV Assingment No. 1
No ratings yet
Adani Group Acquires NDTV Assingment No. 1
11 pages
Coffee Cost Breakdown & Gen Z Work Preferences
No ratings yet
Coffee Cost Breakdown & Gen Z Work Preferences
2 pages
Sabrang' 22 Final Rulebook
No ratings yet
Sabrang' 22 Final Rulebook
50 pages
Technical and Grammar Quiz
No ratings yet
Technical and Grammar Quiz
3 pages
Impulse Invariance and Bilinear
No ratings yet
Impulse Invariance and Bilinear
8 pages
Math 8 Q1 Week 2.2
No ratings yet
Math 8 Q1 Week 2.2
6 pages
Pumping Station Design Guidelines
100% (1)
Pumping Station Design Guidelines
8 pages
Link Game PPSSPP (Sfile
100% (1)
Link Game PPSSPP (Sfile
9 pages
Use Case Points for Objectory Projects
No ratings yet
Use Case Points for Objectory Projects
9 pages
Company Profile PDF
No ratings yet
Company Profile PDF
38 pages
Solutions To Chapter 4 Problems: Problem 4.1
No ratings yet
Solutions To Chapter 4 Problems: Problem 4.1
59 pages

Unit 2 PPT - Part 2

Uploaded by

Unit 2 PPT - Part 2

Uploaded by

Machine Learning

Thumb rule to identify the

Maximizing the distances

This distance is called

The margin for hyper-plane

Hence, we name the right

Another lightning reason

SVM selects the

Here, hyper-plane B has a

Therefore, the right

SVM selects the

Here, hyper-plane B has a

Therefore, the right

The SVM algorithm has a feature to ignore outliers

Hence, we can say, SVM classification is robust to

All values for z would be

In the original plot, red circles

Now, let’s plot the data points on axis x and z

It is used for smaller dataset as it takes too long to process.

that best separates the features into different domains

SV points are very critical in determining the

Technically this hyperplane can also be called as

In 2-D, the function used to classify between features is a line, whereas

The function used to classify the features in a 3-D is called as a plane

the equations of each hyperplane can be considered as:

for the point X3 :

for the point X4 :

for the point X6 :

So these type of SVM is called as hard margin SVM

Radial Basis Function Kernel (RBF) / Gaussian Kernel

❑ Polynomial kernel: K(X,Y)=(γ⋅XTY+r)d,γ>0

❑ Radial basis function (RBF) Kernel: K(X,Y)=exp(∥X−Y∥2/2σ2) which in

Best algorithm when classes are separable

SVM is suited for extreme case binary classification.

Does not perform well in case of overlapped classes.

Selecting the appropriate kernel function can be tricky.

❑ The idea is to gain linearly separation by mapping the data to a higher

❑ How about … mapping data to a higher-dimensional

What is the distance expression for a

You might also like