Classifying data using Support Vector Machines(SVMs) in R

Last Updated : 19 Jun, 2025

Support Vector Machines (SVM) are supervised learning models mainly used for classification and but can also be used for regression tasks. In this approach, each data point is represented as a point in an n-dimensional space where n is the number of features. The goal is to find a hyperplane that best separates the two classes.

Working of SVM Algorithm

A Support Vector Machine (SVM) is a classifier that finds a separating hyperplane to differentiate between classes in the data. A hyperplane is a flat subspace that divides the feature space into two parts for classification tasks. In a two-dimensional space this is simply a line while in higher dimensions it becomes a plane or a hyperplane that separates the data into different categories.

Mathematically, the hyperplane can be represented as :

w \cdot x + b = 0

Where:

w is the weight vector (normal to the hyperplane).
x is a point on the feature space.
b is the bias term that shifts the hyperplane.

For classification SVM aims to maximize the margin between the classes. The margin is the distance between the hyperplane and the closest data points from each class known as support vectors. SVM chooses the hyperplane that maximizes this margin which is given by:

\text{Margin} = \frac{2}{\|w\|}

This ensures the largest possible separation between the classes while minimizing classification errors.

Selecting the Best Hyperplane

To determine the optimal hyperplane, algorithm analyzes labeled training data and evaluates different hyperplanes based on how well they separate the classes. Consider the following scenarios for selecting the best hyperplane:

Scenario 1:

In this case, we have three hyperplanes: A, B and C. The goal is to find the hyperplane that best separates the two classes i.e stars and circles. The rule here is to choose the hyperplane that best divides the classes. In this scenario hyperplane B does the best job at separating the two classes making it the optimal choice.

Scenario 2:

In this situation all three hyperplanes A, B and C do a good job at separating the classes. To identify the best hyperplane we calculate the margin which is the distance between the nearest data points and the hyperplane. The hyperplane with the largest margin is considered the best as it provides better separation. Here hyperplane C has the largest margin making it the optimal choice.

Implementation of SVM in R

We are going to implement the SVM algorithm in R using following steps:

1. Installing and Loading the Required Packages

We need to install and load the e1071 package which contains the svm() function for training the model.

install.packages('e1071') 
install.packages('caTools')
install.packages('ggplot2')
install.packages('caret')

library(caret)
library(e1071) 
library(caTools)
library(ggplot2)

2. Loading the dataset

We will use this dataset of Social network ads from file Social.csv. We will read the dataset using read.csv() function and display the first 6 rows using the head() function.

data = read.csv('/content/social.csv')
head(data)

Output:

3. Exploring the Dataset

We will explore our dataset by using the summary() function which provides a statistical summary of the dataset including measures like minimum, maximum, mean and quartiles.

summary(data)

Output:

4. Performing Data Preprocessing

We need to prepare the data by encoding the categorical variable Gender and scaling the continuous features Age and EstimatedSalary.

set.seed(123)

data$Gender <- as.numeric(factor(data$Gender, levels = c("Male", "Female"), labels = c(0, 1)))

data[, c("Age", "EstimatedSalary")] <- scale(data[, c("Age", "EstimatedSalary")])

split <- sample.split(data$Purchased, SplitRatio = 0.75)
training_set <- subset(data, split == TRUE)
test_set <- subset(data, split == FALSE)

5. Training the SVM Model

Now, we will train the SVM model using the svm() function. The model will predict whether a user purchased the product (Purchased) based on the features Age, EstimatedSalary and Gender.

classifier <- svm(Purchased ~ Age + EstimatedSalary + Gender, 
                  data = training_set, 
                  type = 'C-classification', 
                  kernel = 'radial', 
                  gamma = 0.1)

6. Making Predictions

Once the model is trained we can use it to predict on the test set.

y_pred <- predict(classifier, newdata = test_set)

table(test_set$Purchased, y_pred)

Output:

7. Evaluating the Model

We evaluate the model’s performance using a confusion matrix, accuracy and other metrics like precision, recall, F1-score.

accuracy <- sum(diag(table(test_set$Purchased, y_pred))) / sum(table(test_set$Purchased, y_pred))
cat("Accuracy: ", accuracy)

confusionMatrix(table(test_set$Purchased, y_pred))

Output:

8. Visualizing the Decision Boundary

We can also visualize the decision boundary using ggplot2. Here ,

X1: Creates a sequence for Age (with small steps).
grid_set: Generates a grid of Age and EstimatedSalary combinations.
grid_set$Gender: Sets default Gender using the median value.
y_grid: Predicts the class for each grid point using the classifier.
geom_tile: Fills grid cells with predicted class colors.
geom_point: Plots training points with actual class colors.
scale_fill_manual: Sets colors for predicted classes.
scale_color_manual: Sets colors for actual training points.

X1 = seq(min(training_set$Age) - 1, max(training_set$Age) + 1, by = 0.01)
X2 = seq(min(training_set$EstimatedSalary) - 1, max(training_set$EstimatedSalary) + 1, by = 0.01)

grid_set = expand.grid(X1, X2)
grid_set$Gender = median(training_set$Gender)  # Default Gender value for grid

y_grid = predict(classifier, newdata = grid_set)

ggplot() +
  geom_tile(data = grid_set, aes(x = Age, y = EstimatedSalary, fill = as.factor(y_grid)), alpha = 0.3) +
  geom_point(data = training_set, aes(x = Age, y = EstimatedSalary, color = as.factor(Purchased)), size = 3, shape = 21) +
  scale_fill_manual(values = c('coral1', 'aquamarine')) +
  scale_color_manual(values = c('green4', 'red3')) +
  labs(title = 'SVM Decision Boundary (Training set)', x = 'Age', y = 'Estimated Salary') +
  theme_minimal() +
  theme(legend.position = "none")

Output:

In this article we implemented SVM algorithm in R from data preparation and training the model to evaluating its performance using accuracy, precision, recall and F1-score metrics.

akashkumar17

Improve

Article Tags :

Classifying data using Support Vector Machines(SVMs) in R

Working of SVM Algorithm

Selecting the Best Hyperplane

Scenario 1:

Scenario 2:

Implementation of SVM in R

1. Installing and Loading the Required Packages

2. Loading the dataset

3. Exploring the Dataset

4. Performing Data Preprocessing

5. Training the SVM Model

6. Making Predictions

7. Evaluating the Model

8. Visualizing the Decision Boundary

Explore

Introduction

Fundamentals of R

Variables

Input/Output

Control Flow

Functions

Data Structures

Object Oriented Programming

Error Handling

Thank You!

What kind of Experience do you want to share?