0% found this document useful (0 votes)

12 views36 pages

2024-SCU-ML-2-1-SVM

Uploaded by

wxu5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views36 pages

2024-SCU-ML-2-1-SVM

Uploaded by

wxu5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

CSEN240

Machine Learning
Yen-Kuang Chen, Ph.D., IEEE Fellow
[email protected]

1
Summary of Last Week’s Materials
• Why do we want to use machine learning?
• A huge amount of information must be processed
• However, no simple programmable rules
• ML can be an alternative route to building complicated systems

• When can machines learn?

• Exists some underlying relationship to be learned
• There is performance measurement to be improved
• There are samples (observations) about the relationship
• So, ML has some data to learn from

• Why can machines learn?

• The ability to learn is because the models and algorithms used
Formalize the Machine Learning
• Input: 𝑥⃑ ∈ 𝑋
• Output: 𝑦⃑ ∈ 𝑌
• Unknown relationship to be learned (aka, target function): 𝑓: 𝑋 → 𝑌
• Data (aka, training examples): 𝐷 = 𝑥⃑! , 𝑦⃑! , 𝑥⃑" , 𝑦⃑" , … , 𝑥⃑# , 𝑦⃑#
• Skill (hopefully with good performance): 𝑔: 𝑋 → 𝑌
Unknown target function
𝑓: 𝑋 → 𝑌
Hypothesis

Training examples Learned formula

𝐷 = 𝑥⃑!, 𝑦⃑! , … , 𝑥⃑" , 𝑦⃑" ML 𝑔≈𝑓
A Simple Hypothesis: the ‘Perceptron’
• For 𝑥⃑ = (𝑥! , 𝑥" , …, 𝑥$ ) ‘features of tumor’, compute a weighted score
malignant if ∑$%&! 𝑤% 𝑥% ≥ threshold
0
benign if ∑$%&! 𝑤% 𝑥% < threshold

• 𝑌: +1 malignant , −1 benign
• Note y is single-dimensional in this example

• Classification formula 𝑔 𝑥⃑ = sign ∑$%&! 𝑤% 𝑥% − threshold

• (Ignore 0 for now)
!
Perceptron in ℝ
(For visualization convenience)
• 𝑔 𝑥⃑ = sign 𝑤' + 𝑤! 𝑥! + 𝑤" 𝑥"
+
+

ℝ! ℝ" -
-
Features of tumor 𝑥⃑ Points on the 2D plane Points in ℝ"
Labels 𝑦 +1 (malignant), -1 (benign)
Hypothesis 𝑔 𝑥⃑ Line Hyperplanes in ℝ"
Positive on one side of a line (hyperplane);
Negative on the other side

After having the model, how can we find {𝑤! } & threshold?
à ML algorithms
PLA: A Simple Learning Algorithm
• Start from some random 𝑤( , and correct its mistakes on 𝐷
• For 𝑡 = 0, 1, …
• Find a mistake called 𝑥⃑&(() , 𝑦&(()
• Based on the mistake, “correct” 𝑤( → 𝑤(*+
• until no more mistakes
Review of Perceptron Learning Algorithm
• What kinds of data were given?
• Labeled (often created by humans)
• What kinds of labels? Supervised classification
• Two classes
• Assumptions?
• Linear separatable
• Learning algorithm?
• Iteratively updates its weights in response to errors in its prediction
Remaining Question
• How can machines learn better?
• What are the strategies to improve performance/efficiency?

• Review of PLA
• Assumptions: Linear separable
• Learning algorithm: Iteratively updates its weights in response to errors in its
prediction
• Guarantee: It will find a linear separable boundary

• However,
• How can we find the linear separable boundary faster?
• If data are linear separable, is PLA the best algorithm?
Outline
• Course Atlas
• Support Vector Machine (SVM)
• Inferencing: Decision rule
• Training: Objective function/optimization problem
• Transforming from good to great

9
Course Atlas

Supervised Unsupervised

Dimension
Classification Regression Clustering
reduction

Perceptron Linear
Support Vector Linear
Learning Decision Trees Discriminant K-mean PCA
Machines (SVM) regression
Algorithm (PLA) Analysis (LDA)

10
Quiz
• Which of the following problems a supervised classification algorithm
is best suited for? (select one)
• Forecast sales based on past sales data, economy outlook, etc.
• Identify whether an image contains a dog
• Split the dataset into groups based on their similarities
• Reduce the dimensionality of a dataset

11
Basic Assumptions
• Input data (also called samples)
• 𝑥,
• Binary labeled
𝑦, = 1 for positive samples
•3
𝑦, = −1 for negative samples
• Linear separatable X
+ +
• Clean
+
- vs. - -
- +

12
Supper Vector Machine (SVM)
• To classify future data points with more confidence.
• Margin should be maximal
X
+ +
+ +
vs.
- -
- -

13
Decision Rule
𝑤
• Let’s define 𝑤 is perpendicular to the median +
• 𝑤 is any length for now +
-
-
𝑥⃑
• Decision rule
𝑤 ⋅ 𝑥⃑ ≥ 𝑐, then positive 𝑤 ⋅ 𝑥⃑ + 𝑏 ≥ 0, then positive
• " ≡ "
𝑤 ⋅ 𝑥⃑ < 𝑐, then negative 𝑤 ⋅ 𝑥⃑ + 𝑏 < 0, then negative
• Decision rule with margins
𝑤 ⋅ 𝑥⃑ + 𝑏 ≥ 1, then positive
• " ≡ 𝑦/ (𝑤 ⋅ 𝑥/ + 𝑏) ≥ 1
𝑤 ⋅ 𝑥⃑ + 𝑏 ≤ −1, then negative
• For 𝑥! on the edges of the margin, 𝑦! 𝑤 ⋅ 𝑥! + 𝑏 = 1

14
How to Train an SVM?
• How can we get 𝑤 and 𝑏, given the labeled dataset?

15
Objective Function
𝑤
• The objective is to get the widest margin +
+
• Let’s find an 𝑥" on the positive edge of the margin 𝑥$
and an 𝑥# on the negative edge of the margin -
-
$⋅('0#'1) $⋅'0 $⋅'1 𝑥#
• Width of the margin = = -
$ $ $
• Note for 𝑥/ on the edge of the margin, 𝑦/ 𝑤 ⋅ 𝑥/ + 𝑏 = 1
• 𝑤 ⋅ 𝑥2 = 1 − 𝑏
• 𝑤 ⋅ 𝑥3 = −1 − 𝑏
4
• Width of the margin =
5
)
• Objective: max ≡ min 𝑤
$

16
Constrained Optimization Problem
" ! "
• Objective: max ≡ min 𝑤 ≡ min 𝑤
) "
• Constraint: 𝑦% 𝑤 ⋅ 𝑥% + 𝑏 = 1 for 𝑥% on the edge of the margin

! "
• Lagrangian: ℒ = " 𝑤 − ∑% 𝛼% 𝑦% 𝑤 ⋅ 𝑥% + 𝑏 − 1
𝛼, = 1, when 𝑥, is on the edge of the margin
• Note that 3
𝛼, = 0, when is not on the edge of the margin

*ℒ *ℒ
• To find out the minimal, = 0 and =0
*) *,

17
! "
ℒ= 𝑤 − ∑% 𝛼% 𝑦% 𝑤 ⋅ 𝑥% + 𝑏 − 1
"
Quadratic Programming
*ℒ
• *$ = 𝑤 − ∑! 𝛼! 𝑦! 𝑥! = 0 ⟹ 𝑤 = ∑! 𝛼! 𝑦! 𝑥!
*ℒ
• = − ∑! 𝛼! 𝑦! = 0 ⟹ ∑! 𝛼! 𝑦! = 0
*,
-
• ℒ = 𝑤 ) − ∑! 𝛼! 𝑦! 𝑤 ⋅ 𝑥! + 𝑏 − 1
)
-
= ∑! 𝛼! 𝑦! 𝑥! ⋅ ∑. 𝛼. 𝑦. 𝑥. − ∑! 𝛼! 𝑦! 𝑥! ⋅ ∑. 𝛼. 𝑦. 𝑥. − ∑! 𝛼! 𝑦! 𝑏 + ∑! 𝛼 !
)
-
= ! 𝛼! − ∑! ∑. 𝛼! 𝛼. 𝑦! 𝑦. 𝑥! ⋅ 𝑥.
∑
)
• Given 𝑥! and 𝑦! , we just need to solve 𝛼! to minℒ
• Looks simple
• Is a quadratic programming problem
• Is NP-hard
There are packages to help solve the SVM optimization problem.

18
Quiz
• What is the primary objective of a Support Vector Machine (SVM)?
1. Maximize the margin between support vectors
2. Minimize the margin between support vectors
3. Maximize the number of support vectors
4. Minimize the number of support vectors

19
Quiz
• In SVM classification, what is the term 'support vectors' referring to?
1. Data points that are hard to classify
2. Data points that lie closest to the decision boundary
3. Data points that belong to the majority class
4. Data points that belong to the minority class

20
Transforming from Good to Great
• What if it is not linear • What if the given data are not
separatable? clean?
-
- +
+ + +
+
+ - +
- +
+ - +
- -
- -
-

• Kernel function • Soft-margin SVM

21
Polynomial Kernels

Source: M. Kubat, An Introduction to Machine Learning, Springer 22

Nonlinear SVM with XOR Data Set
• Output = x1 XOR x2 • Project of the XOR into a 3D space,
where is linearly separable

Source: U. Braga-Neto, Fundamentals of Pattern Recognition and Machine Learning, Springer, 2020 23
Radial Basis Function (RBF) Kernel
• One of the most popular non-linear kernels in SVM

24
Quiz
• What is the 'kernel trick' in SVMs used for?
1. To turn a linearly inseparable problem into a linearly separable one
2. To increase the dimensionality of the feature space
3. To reduce the dimensionality of the feature space
4. To adjust the cost parameter 'C'

25
Soft-margin SVM
• There may be outliers or noises in the data from real-world
applications.
• To address this issue, a soft margin can be used in a modified
optimization problem, known as a soft-margin SVM:
+
•Objective: min / 𝑤 / + 𝐶 ∑, 𝜉,
• Constraint: 𝑦, 𝑤 ⋅ 𝑥, + 𝑏 = 1 − 𝜉, and 𝜉, ≥ 0 𝑤
+
• 𝜉, is the slack, which allows 𝑥, to be inside the margin
+
- -
• SVM without the slacks is known as hard-margin SVM. 𝜉%
𝑥⃑

26
Group Discussion
• Where is 𝑥% relative to where the margin is when its 𝜉% value is 0?
• Where is 𝑥% relative to where the margin is when 0 ≤ 𝜉% ≤ 1?
• Where is 𝑥% relative to where the margin is when 𝜉% > 1?

𝑤
+
+
- -
𝜉%
𝑥⃑

27
Quiz
• What is the main difference between hard-margin and soft-margin
SVMs?
1. Hard-margin SVMs have a wider margin
2. Soft-margin SVMs allow for misclassification
3. Soft-margin SVMs have no margin
4. Soft-margin SVMs do not use kernels

28
Quiz
• What is the purpose of the hyperparameter 'C' in soft-margin SVMs?
1. To control the width of the margin
2. To control the trade-off between margin width and misclassification
3. To specify the number of support vectors
4. To adjust the kernel function

29
Quiz
• What happens if the cost parameter 'C' in a soft-margin SVM is set to
a very high value?
1. The margin becomes wider
2. The margin becomes narrower
3. More misclassifications are allowed
4. Fewer misclassifications are allowed

30
Summary of SVM
• Finding a hyperplane that separates the data with a maximal margin,
• In order to classify future data points with more confidence
• To find the optimal hyperplane, we need to solve a quadratic optimization problem
• The name "Support Vector Machine" (SVM) comes from using support
vectors to create a hyperplane that separates the data
• The support vectors are the data points that are closest to the hyperplane
• Kernels in SVMs are used to turn a linearly inseparable problem into a
linearly separable one
• Soft-margin SVMs allow for misclassification
• Fewer misclassifications are allowed if the cost parameter 'C' in a soft-margin SVM is
set to a very high value

31
Geometry Review for Homework #2
• Find the line that separates 3 or 4 points with maximal margins
• If 3 points form an isosceles triangle +

- -
• If 4 points form a rectangle + +

- -
Scikit-Learn SVM packages
Class Library Time Complexity Kernel trick
LinearSVC Liblinear O(m*n) No
SVC Libsvm O(m2*n) to O(m3*n) Yes
Group Discussions
• Compare Perceptron Learning Algorithm (PLA) and Support Vector
Machines (SVM)
• Both are classification algorithms
• Limitation: PLA is a linear algorithm that finds the hyperplane to separate the
data, while SVM is a more flexible algorithm that can use nonlinear kernels to
separate the data.
• Quality: SVM finds the hyperplane with the largest margin, which is the
distance between the hyperplane and the nearest data points. PLA does not
consider margin and simply finds a hyperplane that separates the data.
• Computation: PLA can be faster and more computationally efficient than SVM
on simple datasets with linear boundaries.

34
List of Key Questions About Each Machine
Learning Algorithm
• What kinds of data were given?
• Labeled?
• What kind of labels?
• Continuous or classification labels? Number of categories?
• Assumptions?
• Linear separatable?
• Learning algorithm?
• Direct optimization? Iterative optimization? Parameters?
• Inference computation?
• Linear? Polynomial? Non-linear?
• Error function?
• Class labels? Probabilities? Manual threshold?
• Overfitting or underfitting?
35
Any Questions About HW#1?

Pyros EQS v1.2 Help
No ratings yet
Pyros EQS v1.2 Help
154 pages
Abhay Wagh Resume
No ratings yet
Abhay Wagh Resume
6 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Svm
No ratings yet
Svm
29 pages
Unit 2
No ratings yet
Unit 2
47 pages
Svm
No ratings yet
Svm
52 pages
ML-Lec9-SVM
No ratings yet
ML-Lec9-SVM
32 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
AP for NLP-LO2
No ratings yet
AP for NLP-LO2
38 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Svm
No ratings yet
Svm
52 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
10 SVM
No ratings yet
10 SVM
23 pages
Deep Learn
No ratings yet
Deep Learn
48 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Lecture7C Classification
No ratings yet
Lecture7C Classification
34 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
SVM
No ratings yet
SVM
11 pages
svm
No ratings yet
svm
33 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
SVM
No ratings yet
SVM
40 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
Support_Vector_Machine(SVM)[1]
No ratings yet
Support_Vector_Machine(SVM)[1]
103 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Class
No ratings yet
SVM Class
33 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
46 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Project-Solar Cooker-student guide_new
No ratings yet
Project-Solar Cooker-student guide_new
11 pages
BS en 50318
100% (1)
BS en 50318
20 pages
Aiwa nsx-dr5 SM
No ratings yet
Aiwa nsx-dr5 SM
19 pages
Api600 Valve Trim Chart
No ratings yet
Api600 Valve Trim Chart
2 pages
AIPCP Abstract
No ratings yet
AIPCP Abstract
1 page
Case Study War Eagle Gold
No ratings yet
Case Study War Eagle Gold
4 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
1 Amran 2023
No ratings yet
1 Amran 2023
14 pages
PROJECT - STANDARDS - AND - SPECIFICATIONS - Heat - Tracing - and - Winterizing - Systems - Rev01 FR
No ratings yet
PROJECT - STANDARDS - AND - SPECIFICATIONS - Heat - Tracing - and - Winterizing - Systems - Rev01 FR
15 pages
Heat Transfer Equipment Course
No ratings yet
Heat Transfer Equipment Course
29 pages
Claus Tail Gas Treating Unit (TGTU)
No ratings yet
Claus Tail Gas Treating Unit (TGTU)
3 pages
UNIT 3-Ost
No ratings yet
UNIT 3-Ost
16 pages
MEIL - SkyC - Spares Offer
No ratings yet
MEIL - SkyC - Spares Offer
5 pages
Cenovnik
No ratings yet
Cenovnik
5 pages
F100A Loop Responder
No ratings yet
F100A Loop Responder
2 pages
ECTS Sheet
No ratings yet
ECTS Sheet
12 pages
Jncia Lab Guide
No ratings yet
Jncia Lab Guide
18 pages
Cn62!2!2014 01 Noise & Vibration DNV
No ratings yet
Cn62!2!2014 01 Noise & Vibration DNV
8 pages
Syllabus BCA Sem-2
No ratings yet
Syllabus BCA Sem-2
39 pages
Project = Customer Segmentation for E-commerce
No ratings yet
Project = Customer Segmentation for E-commerce
40 pages
Physico-Thermal Properties of Cashew Nut Shells: A. P. Chaudhari and N. J. Thakor
No ratings yet
Physico-Thermal Properties of Cashew Nut Shells: A. P. Chaudhari and N. J. Thakor
11 pages
Line Modelling and Energization
No ratings yet
Line Modelling and Energization
6 pages
K Strips in Artificial Intelligence
No ratings yet
K Strips in Artificial Intelligence
2 pages
Classification of Handtools
No ratings yet
Classification of Handtools
20 pages
Sas Functions
No ratings yet
Sas Functions
9 pages
Lab 10
No ratings yet
Lab 10
3 pages
Anatomy of Lower Eyelid and Eyelid
No ratings yet
Anatomy of Lower Eyelid and Eyelid
10 pages
Basics of Electrical Grounding Earthing and Bonding
100% (4)
Basics of Electrical Grounding Earthing and Bonding
35 pages