Open In App

How to perform 10 fold cross validation with LibSVM in R?

Last Updated : 30 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Support Vector Machines (SVM) are a powerful tool for classification and regression tasks. LibSVM is a widely used library that implements SVM, and it can be accessed in R with the e1071 package. Cross-validation, particularly 10-fold cross-validation, is an essential technique for assessing the performance and generalizability of a model. This article will explain the theory behind 10-fold cross-validation and demonstrate how to perform it using LibSVM in R Programming Language.

Overview of 10 fold cross validation

Cross-validation is a statistical method used to estimate the skill of a model on unseen data. It is commonly used to assess the effectiveness of machine learning models.

  1. Cross-Validation: A technique to evaluate the performance of a model by partitioning the data into subsets, training the model on some subsets, and validating it on the remaining subsets.
  2. K-Fold Cross-Validation: Involves splitting the dataset into K equally-sized folds. The model is trained K times, each time using a different fold as the validation set and the remaining K-1 folds as the training set.
  3. 10-Fold Cross-Validation: A specific case of K-Fold Cross-Validation where K=10. This is a widely used method that balances computational efficiency and performance estimation accuracy.

Now we will discuss the step-by-step implementation of How to perform 10-fold cross-validation with LibSVM in R Programming Language.

Step 1: Getting Started with e1071 and libsvm

To perform 10-fold cross-validation with libsvm, we need to install and load the e1071 package:

R
install.packages("e1071")
library(e1071)

Step 2: Prepare the Data

We will use the iris dataset for this example. The dataset will be split into features and labels.

R
# Load the iris dataset
data(iris)

# Split the dataset into features (X) and labels (y)
X <- iris[, -5]
y <- iris$Species

Step 3: Define the Cross-Validation Folds

Create 10 folds for cross-validation using the createFolds function from the caret package.

R
# Install and load the caret package
install.packages("caret")
library(caret)

# Create 10 folds
set.seed(123)
folds <- createFolds(y, k = 10, list = TRUE)
folds 

Output:

$Fold01
[1] 1 18 26 28 45 57 59 61 98 100 126 128 129 132 143

$Fold02
[1] 5 27 29 39 46 66 74 84 86 97 101 105 106 136 149

$Fold03
[1] 4 6 32 35 47 51 53 82 88 91 114 137 138 140 146

$Fold04
[1] 3 23 33 48 50 71 75 81 85 95 102 103 113 121 130

$Fold05
[1] 2 9 12 24 42 55 58 67 77 78 109 112 125 131 144

$Fold06
[1] 10 21 38 44 49 60 62 64 83 90 119 120 122 127 142

$Fold07
[1] 7 11 19 25 43 54 70 89 92 93 108 115 118 123 124

$Fold08
[1] 8 14 17 37 41 52 56 65 73 96 135 139 141 145 148

$Fold09
[1] 15 16 22 30 40 69 72 79 87 99 116 117 134 147 150

$Fold10
[1] 13 20 31 34 36 63 68 76 80 94 104 107 110 111 133

The dataset is divided into 10 folds using the createFolds function from the caret package.

Step 4: Perform 10-Fold Cross-Validation

Train and evaluate the SVM model using LibSVM for each fold.

R
# Initialize a vector to store accuracy for each fold
accuracies <- c()

# Loop over each fold
for (i in 1:10) {
  # Split the data into training and test sets
  test_indices <- folds[[i]]
  train_data <- X[-test_indices, ]
  train_labels <- y[-test_indices]
  test_data <- X[test_indices, ]
  test_labels <- y[test_indices]
  
  # Train the SVM model using LibSVM
  svm_model <- svm(train_data, train_labels, type = 'C-classification', 
                                                                  kernel = 'linear')
  
 summary(svm_model)
 

Output:

Call:
svm.default(x = train_data, y = train_labels, type = "C-classification",
kernel = "linear")


Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1

Number of Support Vectors: 28

( 2 15 11 )


Number of Classes: 3

Levels:
setosa versicolor virginica

For each fold, the model is trained on the training data (9 folds) and evaluated on the test data (1 fold).

Step 5: Evaluate the model performance

Now we will Evaluate the model performance.

R
  # Make predictions on the test set
  predictions <- predict(svm_model, test_data)
  
  # Calculate accuracy
  accuracy <- sum(predictions == test_labels) / length(test_labels)
  
  # Store the accuracy
  accuracies <- c(accuracies, accuracy)

# Calculate and print the average accuracy
average_accuracy <- mean(accuracies)
print(paste("Average Accuracy:", round(average_accuracy * 100, 2), "%"))

Output:

[1] "Average Accuracy: 96.67 %"

The average accuracy across all 10 folds is calculated and printed, providing an estimate of the model's performance.

Conclusion

Performing 10-fold cross-validation in R is straightforward with the e1071 package for libsvm and the glmnet package for regularized regression. By following the steps and examples provided in this article, you can ensure that your models are robustly evaluated and optimized.

By integrating cross-validation into your modeling workflow, you can improve the reliability and performance of your predictive models, whether you're working with support vector machines or regularized linear models.


Next Article

Similar Reads