0% found this document useful (0 votes)

60 views

sectionSVM PDF

This document provides an introduction to support vector machines (SVMs) using the R package kernlab. It begins with a simple linear SVM example using synthetic two-dimensional data to classify points as positive or negative. The document explores changing parameters like C and using different kernels to handle non-linearly separable data. Finally, it applies SVMs to a cancer gene expression dataset to classify tumors.

Uploaded by

Arif Rahman Hakim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

sectionSVM PDF

Uploaded by

Arif Rahman Hakim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Practical session: Introduction to SVM in R

Jean-Philippe Vert

November 23, 2015

In this session you will

• Learn how manipulate a SVM in R with the package kernlab

• Observe the effect of changing the C parameter and the kernel

• Test a SVM classifier for cancer diagnosis from gene expression data

1 Linear SVM
Here we generate a toy dataset in 2D, and learn how to train and test a SVM.

1.1 Generate toy data

First generate a set of positive and negative examples from 2 Gaussians.

n <- 150 #number of data points

p <- 2 # dimension
sigma <- 1 # variance of the distribution
meanpos <- 0 # centre of the distribution of positive examples
meanneg <- 3 # centre of the distribution of negative examples
npos <- round(n / 2) # number of positive examples
nneg <- n - npos # number of negative examples

# Generate the positive and negative examples

xpos <- matrix(rnorm(npos * p, mean = meanpos, sd = sigma), npos, p)
xneg <- matrix(rnorm(nneg * p, mean = meanneg, sd = sigma), npos, p)
x <- rbind(xpos, xneg)

# Generate the labels

y <- matrix(c(rep(1, npos), rep(-1, nneg)))

# Visualize the data

plot(x, col = ifelse(y > 0, 1, 2))
legend("topleft", c("Positive", "Negative"), col = seq(2), pch = 1, text.col = seq(2))

1
1.1 Generate toy data 1 LINEAR SVM

Positive
Negative
4
2
x[,2]

0
−2

−2 0 2 4

x[,1]

Now we split the data into a training set (80%) and a test set (20%)

# Prepare a training and a test set

ntrain <- round(n * 0.8) # number of training examples
tindex <- sample(n, ntrain) # indices of training samples
xtrain <- x[tindex, ]
xtest <- x[-tindex, ]
ytrain <- y[tindex]
ytest <- y[-tindex]
istrain <- rep(0, n)
istrain[tindex] <- 1

# Visualize
plot(x, col = ifelse(y > 0, 1, 2), pch = ifelse(istrain == 1,1,2))
legend("topleft", c("Positive Train", "Positive Test", "Negative Train", "Negative Test"), col = c(1, 1,

2
1.2 Train a SVM 1 LINEAR SVM

Positive Train
Positive Test
Negative Train
4

Negative Test
2
x[,2]

0
−2

−2 0 2 4

x[,1]

1.2 Train a SVM

Now we train a linear SVM with parameter C=100 on the training set.

# load the kernlab package

# install.packages("kernlab")
library(kernlab)

# train the SVM

svp <- ksvm(xtrain, ytrain, type = "C-svc", kernel = "vanilladot", C=100, scaled=c())

#Look and understand what svp contains

# General summary
svp

# Attributes that you can access

attributes(svp)

# For example, the support vectors

alpha(svp)
alphaindex(svp)
b(svp)

3
1.3 Predict with a SVM 1 LINEAR SVM

# Use the built-in function to pretty-plot the classifier

plot(svp, data = xtrain)

QUESTION1 - Write a function plotlinearsvm=function(svp,xtrain) to plot the points and the

decision boundaries of a linear SVM, as in Figure 1. To add a straight line to a plot, you may
use the function abline.

1.3 Predict with a SVM

Now we can use the trained SVM to predict the label of points in the test set, and we analyze the results
using variant metrics.

# Predict labels on test

ypred <- predict(svp, xtest)
table(ytest, ypred)

4
1.4 Cross-validation 1 LINEAR SVM

# Compute accuracy
sum(ypred == ytest) / length(ytest)

# Compute at the prediction scores

ypredscore <- predict(svp, xtest, type = "decision")

# Check that the predicted labels are the signs of the scores
table(ypredscore > 0, ypred)

# Package to compute ROC curve, precision-recall etc...

# install.packages("ROCR")
library(ROCR)

## Loading required package: gplots

##
## Attaching package: ’gplots’
##
## The following object is masked from ’package:stats’:
##
## lowess

pred <- prediction(ypredscore, ytest)

# Plot ROC curve

perf <- performance(pred, measure = "tpr", x.measure = "fpr")
plot(perf)
# Plot precision/recall curve
perf <- performance(pred, measure = "prec", x.measure = "rec")
plot(perf)
# Plot accuracy as function of threshold
perf <- performance(pred, measure = "acc")
plot(perf)

1.4 Cross-validation
Instead of fixing a training set and a test set, we can improve the quality of these estimates by running k-fold
cross-validation. We split the training set in k groups of approximately the same size, then iteratively train
a SVM using k - 1 groups and make prediction on the group which was left aside. When k is equal to the
number of training points, we talk of leave-one-out (LOO) cross-validatin. To generate a random split of n
points in k folds, we can for example create the following function:

cv.folds <- function(y, folds = 3){

## randomly split the n samples into folds
split(sample(length(y)), rep(1:folds, length = length(y)))
}

QUESTION2 - Write a function cv.ksvm = function(x, y, folds = 3,...) which returns a vector
ypred of predicted decision score for all points by k-fold cross-validation

QUESTION3 - Compute the various performance of the SVM by 5-fold cross-validation. Al-
ternatively, the ksvm function can automatically compute the k-fold cross-validation accuracy:

5
1.5 Effect of C 1 LINEAR SVM

svp <- ksvm(x, y, type = "C-svc", kernel = "vanilladot", C = 100, scaled=c(), cross = 5)
print(cross(svp))
print(error(svp))

QUESTION4 - Compare the 5-fold CV estimated by your function and ksvm.

1.5 Effect of C
The C parameters balances the trade-off between having a large margin and separating the positive and
unlabeled on the training set. It is important to choose it well to have good generalization.

QUESTION5 - Plot the decision functions of SVM trained on the toy examples for different
values of C in the range 2seq(−10,14) . To look at the different plots you can use the function
par(ask=T) that will ask you to press a key between successive plots. Alternatively, you can
use par(mfrow = c(5,5)) to see all the plots in the same window

QUESTION6 - Plot the 5-fold cross-validation error as a function of C.

QUESTION7 - Do the same on data with more overlap between the two classes, e.g., re-
generate toy data with meanneg being 1.

6
2 NONLINEAR SVM

2 Nonlinear SVM
Sometimes linear SVM are not enough. For example, generate a toy dataset where positive and negative
examples are mixture of two Gaussians which are not linearly separable.

QUESTION8 - Make a toy example that looks like Figure 2, and test a linear SVM with
different values of C.

To solve this problem, we should instead use a nonlinear SVM. This is obtained by simply changing the
kernel parameter. For example, to use a Gaussian RBF kernel with σ = 1 and C = 1:

# Train a nonlinear SVM

svp <- ksvm(x, y, type = "C-svc", kernel="rbf", kpar = list(sigma = 1), C = 1)

# Visualize it
plot(svp, data = x)

You should obtain something that look like Figure 3. Much better than the linear SVM, no? The nonlinear
SVM has now two parameters: σ and C. Both play a role in the generalization capacity of the SVM.

QUESTION9 - Visualize and compute the 5-fold cross-validation error for different values of
C and σ. Observe their influence.

7
2 NONLINEAR SVM

A useful heuristic to choose σ is implemented in kernlab. It is based on the quantiles of the distances between
the training point.

# Train a nonlinear SVM with automatic selection of sigma by heuristic

svp <- ksvm(x, y, type = "C-svc", kernel = "rbf", C = 1)

# Visualize it
plot(svp, data = x)

QUESTION10 - Train a nonlinear SVM with various of C with automatic determination of σ.

In fact, many other nonlinear kernels are implemented. Check the documentation of kernlab
to see them: ?kernels

QUESTION11 - Test the polynomial, hyperbolic tangent, Laplacian, Bessel and ANOVA ker-
nels on the toy examples.

8
3 APPLICATION: CANCER DIAGNOSIS FROM GENE EXPRESSION DATA

3 Application: cancer diagnosis from gene expression data

As a real-world application, let us test the ability of SVM to predict the class of a tumour from gene ex-
pression data. We use a publicly available dataset of gene expression data for 128 different individuals with
acute lymphoblastic leukemia (ALL).

# Load the ALL dataset

library(ALL)

## Loading required package: Biobase

## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: ’BiocGenerics’
##
## The following objects are masked from ’package:parallel’:
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
##
## The following object is masked from ’package:stats’:
##
## xtabs
##
## The following objects are masked from ’package:base’:
##
## anyDuplicated, append, as.data.frame, as.vector, cbind,
## colnames, do.call, duplicated, eval, evalq, Filter, Find, get,
## intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rep.int, rownames, sapply, setdiff, sort,
## table, tapply, union, unique, unlist, unsplit
##
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## ’browseVignettes()’. To cite Bioconductor, see
## ’citation("Biobase")’, and for packages ’citation("pkgname")’.

data(ALL)

# Inspect them
?ALL
show(ALL)
print(summary(pData(ALL)))

Here we focus on predicting the type of the disease (B-cell or T-cell). We get the expression data and disease
type as follows

9
3 APPLICATION: CANCER DIAGNOSIS FROM GENE EXPRESSION DATA

x <- t(exprs(ALL))
y <- substr(ALL$BT,1,1)

QUESTION12 - Test the ability of a SVM to predict the class of the disease from gene ex-
pression. Check the influence of the parameters.

Finally, we may want to predict the type and stage of the diseases. We are then confronted with a multi-class
classification problem, since the variable to predict can take more than two values:

y <- ALL$BT
print(y)

## [1] B2 B2 B4 B1 B2 B1 B1 B1 B2 B2 B3 B3 B3 B2 B3 B B2 B3 B2 B3 B2 B2 B2
## [24] B1 B1 B2 B1 B2 B1 B2 B B B2 B2 B2 B1 B2 B2 B2 B2 B2 B4 B4 B2 B2 B2
## [47] B4 B2 B1 B2 B2 B3 B4 B3 B3 B3 B4 B3 B3 B1 B1 B1 B1 B3 B3 B3 B3 B3 B3
## [70] B3 B3 B1 B3 B1 B4 B2 B2 B1 B3 B4 B4 B2 B2 B3 B4 B4 B4 B1 B2 B2 B2 B1
## [93] B2 B B T T3 T2 T2 T3 T2 T T4 T2 T3 T3 T T2 T3 T2 T2 T2 T1 T4 T
## [116] T2 T3 T2 T2 T2 T2 T3 T3 T3 T2 T3 T2 T
## Levels: B B1 B2 B3 B4 T T1 T2 T3 T4

Fortunately, kernlab implements automatically multi-class SVM by an all-versus-all strategy to combine

several binary SVM.

QUESTION13 - Test the ability of a SVM to predict the class and the stage of the disease
from gene expression.

Excerpt From "Artificial Intelligence: A Guide For Thinking Humans" by Melanie Mitchell
100% (1)
Excerpt From "Artificial Intelligence: A Guide For Thinking Humans" by Melanie Mitchell
12 pages
ML Lab6
No ratings yet
ML Lab6
4 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
ML_Lab_01999676272
No ratings yet
ML_Lab_01999676272
12 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Assignment - 01
No ratings yet
Assignment - 01
4 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
Implementing Custom Randomsearchcv: 'Red' 'Blue'
No ratings yet
Implementing Custom Randomsearchcv: 'Red' 'Blue'
1 page
Maxbox - Starter67 Machine Learning
No ratings yet
Maxbox - Starter67 Machine Learning
7 pages
SVM in R (David Meyer)
No ratings yet
SVM in R (David Meyer)
8 pages
ML Usar Manual-2
No ratings yet
ML Usar Manual-2
21 pages
Neural_DEEP
No ratings yet
Neural_DEEP
39 pages
SVM Implementation
No ratings yet
SVM Implementation
8 pages
Exp 5
No ratings yet
Exp 5
8 pages
CS6301 Homework2 KR
No ratings yet
CS6301 Homework2 KR
13 pages
MLR Example 2predictors
No ratings yet
MLR Example 2predictors
5 pages
KNN - Model: Train Test CL K
No ratings yet
KNN - Model: Train Test CL K
2 pages
Module_5
No ratings yet
Module_5
5 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
36 pages
healthcare-project-simplilearn- Week4
No ratings yet
healthcare-project-simplilearn- Week4
3 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
10 pages
svmdoc
No ratings yet
svmdoc
7 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
S20220020307_ASSIGNMENT_2
No ratings yet
S20220020307_ASSIGNMENT_2
4 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
Pythonfile
No ratings yet
Pythonfile
36 pages
ML practical Lovepreet 6-10
No ratings yet
ML practical Lovepreet 6-10
10 pages
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
No ratings yet
Support Vector Machines: The Interface To Libsvm in Package E1071 by David Meyer FH Technikum Wien, Austria
8 pages
21CSC305P Ml - Lab Programs 1 -9
No ratings yet
21CSC305P Ml - Lab Programs 1 -9
36 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
ML practical Kunal 6-10
No ratings yet
ML practical Kunal 6-10
10 pages
Assignment 1: Q1. Task Description
No ratings yet
Assignment 1: Q1. Task Description
12 pages
ML practical Manjot 6-10
No ratings yet
ML practical Manjot 6-10
10 pages
ML practical Kiranjot 6-10
No ratings yet
ML practical Kiranjot 6-10
10 pages
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
No ratings yet
1 Analytical Part (3 Percent Grade) : + + + 1 N I: y +1 I 1 N I: y 1 I
5 pages
PE IV - Practical Machine Learning
No ratings yet
PE IV - Practical Machine Learning
7 pages
Article - 10 Machine Learning Algorithms in R
No ratings yet
Article - 10 Machine Learning Algorithms in R
2 pages
Wine Classification
No ratings yet
Wine Classification
10 pages
Machine Learning Lab4
No ratings yet
Machine Learning Lab4
4 pages
21BEC505 Exp2
No ratings yet
21BEC505 Exp2
7 pages
machine learning final manual
No ratings yet
machine learning final manual
45 pages
Machine
100% (1)
Machine
45 pages
Practical 6
No ratings yet
Practical 6
8 pages
Experiment 2.1
No ratings yet
Experiment 2.1
3 pages
ML Lab
No ratings yet
ML Lab
7 pages
dl lab1
No ratings yet
dl lab1
15 pages
DL JOURNAL - Merged
No ratings yet
DL JOURNAL - Merged
27 pages
MLP Unit-2
No ratings yet
MLP Unit-2
102 pages
P06 The Classification Pipeline Ans
No ratings yet
P06 The Classification Pipeline Ans
16 pages
SVM - Classification - Jupyter Notebook
No ratings yet
SVM - Classification - Jupyter Notebook
2 pages
C2_W3_Assignment
No ratings yet
C2_W3_Assignment
437 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Time Series Forecasting: Kick-Start Your Project With My New Book
No ratings yet
Time Series Forecasting: Kick-Start Your Project With My New Book
50 pages
SVM linear and non linear kernels
No ratings yet
SVM linear and non linear kernels
2 pages
3.1 Model Check
No ratings yet
3.1 Model Check
20 pages
ML - Practical File
No ratings yet
ML - Practical File
15 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Research On Artificial Intelligence Algorithm and Its Application in Games
No ratings yet
Research On Artificial Intelligence Algorithm and Its Application in Games
4 pages
Brochure TASKalfa 2554ci 3554ci 4054ci 5054ci 6054ci 7054ci 5004i 6004i 7004i AS
No ratings yet
Brochure TASKalfa 2554ci 3554ci 4054ci 5054ci 6054ci 7054ci 5004i 6004i 7004i AS
8 pages
Baldwin Parts
100% (1)
Baldwin Parts
364 pages
Theoretical Foundations For Intelligent Tutoring Systems: John Self
No ratings yet
Theoretical Foundations For Intelligent Tutoring Systems: John Self
12 pages
Databricks 2023 State of Data Report 06072023-v2
No ratings yet
Databricks 2023 State of Data Report 06072023-v2
25 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
ИСПИТ AI-900 (QnA)
No ratings yet
ИСПИТ AI-900 (QnA)
39 pages
FYP Mid Progress PPT Template (Sample)
No ratings yet
FYP Mid Progress PPT Template (Sample)
16 pages
Black Book of Building Wealth in Private Tech Investing
No ratings yet
Black Book of Building Wealth in Private Tech Investing
60 pages
Fundamental Steps of Digital Image Processing
No ratings yet
Fundamental Steps of Digital Image Processing
3 pages
Management Information System: Expert Systems
No ratings yet
Management Information System: Expert Systems
19 pages
Winning_Strategies_of_Hypergrowth_Saas_Champions_by_BCG_1732107593
No ratings yet
Winning_Strategies_of_Hypergrowth_Saas_Champions_by_BCG_1732107593
66 pages
BTECH_(L&SCM)_Detailed_Syllabus
No ratings yet
BTECH_(L&SCM)_Detailed_Syllabus
43 pages
CRM Analytics
No ratings yet
CRM Analytics
6 pages
Automated (AI) Planning: Autonomous Systems
No ratings yet
Automated (AI) Planning: Autonomous Systems
53 pages
Advances in Computer Games 17th International Conference ACG 2021 Virtual Event November 23 25 2021 Revised Selected Papers Cameron Browne
No ratings yet
Advances in Computer Games 17th International Conference ACG 2021 Virtual Event November 23 25 2021 Revised Selected Papers Cameron Browne
49 pages
Class8 Ch5 Notes
No ratings yet
Class8 Ch5 Notes
2 pages
The Role of AI in Theatre
No ratings yet
The Role of AI in Theatre
8 pages
(Ebook) Artificial Intelligence for Human Computer Interaction: A Modern Approach by Yang Li, Otmar Hilliges ISBN 9783030826802, 3030826805 - Download the full set of chapters carefully compiled
100% (3)
(Ebook) Artificial Intelligence for Human Computer Interaction: A Modern Approach by Yang Li, Otmar Hilliges ISBN 9783030826802, 3030826805 - Download the full set of chapters carefully compiled
63 pages
5 Unconventional Business Ideas Using AI To Make You Rich!
No ratings yet
5 Unconventional Business Ideas Using AI To Make You Rich!
11 pages
94. Yang et al., 2024, From surface to deep learning approaches with Generative AI in higher education an analytical framework of student agency
No ratings yet
94. Yang et al., 2024, From surface to deep learning approaches with Generative AI in higher education an analytical framework of student agency
15 pages
Sanmati Engineering College Brochure PDF
No ratings yet
Sanmati Engineering College Brochure PDF
22 pages
GenAI Use Cases For AI Inventory
No ratings yet
GenAI Use Cases For AI Inventory
2 pages
Cyber Defense Magazine-August 2023
No ratings yet
Cyber Defense Magazine-August 2023
144 pages
Prediction of Personal Loan Approval in Bank Using Logistic Regression and Support Vector Machine
No ratings yet
Prediction of Personal Loan Approval in Bank Using Logistic Regression and Support Vector Machine
3 pages
Negative Impact of Chatbot AI in Students
No ratings yet
Negative Impact of Chatbot AI in Students
5 pages
MQP1
No ratings yet
MQP1
3 pages
Introduction To Deep Learning - Assignment
No ratings yet
Introduction To Deep Learning - Assignment
4 pages
Class 10 PRACTICE PAPER AI
No ratings yet
Class 10 PRACTICE PAPER AI
2 pages