How to perform 10 fold cross validation with LibSVM in R?
Last Updated :
30 Jul, 2024
Support Vector Machines (SVM) are a powerful tool for classification and regression tasks. LibSVM is a widely used library that implements SVM, and it can be accessed in R with the e1071
package. Cross-validation, particularly 10-fold cross-validation, is an essential technique for assessing the performance and generalizability of a model. This article will explain the theory behind 10-fold cross-validation and demonstrate how to perform it using LibSVM in R Programming Language.
Overview of 10 fold cross validation
Cross-validation is a statistical method used to estimate the skill of a model on unseen data. It is commonly used to assess the effectiveness of machine learning models.
- Cross-Validation: A technique to evaluate the performance of a model by partitioning the data into subsets, training the model on some subsets, and validating it on the remaining subsets.
- K-Fold Cross-Validation: Involves splitting the dataset into K equally-sized folds. The model is trained K times, each time using a different fold as the validation set and the remaining K-1 folds as the training set.
- 10-Fold Cross-Validation: A specific case of K-Fold Cross-Validation where K=10. This is a widely used method that balances computational efficiency and performance estimation accuracy.
Now we will discuss the step-by-step implementation of How to perform 10-fold cross-validation with LibSVM in R Programming Language.
Step 1: Getting Started with e1071 and libsvm
To perform 10-fold cross-validation with libsvm, we need to install and load the e1071 package:
R
install.packages("e1071")
library(e1071)
Step 2: Prepare the Data
We will use the iris
dataset for this example. The dataset will be split into features and labels.
R
# Load the iris dataset
data(iris)
# Split the dataset into features (X) and labels (y)
X <- iris[, -5]
y <- iris$Species
Step 3: Define the Cross-Validation Folds
Create 10 folds for cross-validation using the createFolds
function from the caret
package.
R
# Install and load the caret package
install.packages("caret")
library(caret)
# Create 10 folds
set.seed(123)
folds <- createFolds(y, k = 10, list = TRUE)
folds
Output:
$Fold01
[1] 1 18 26 28 45 57 59 61 98 100 126 128 129 132 143
$Fold02
[1] 5 27 29 39 46 66 74 84 86 97 101 105 106 136 149
$Fold03
[1] 4 6 32 35 47 51 53 82 88 91 114 137 138 140 146
$Fold04
[1] 3 23 33 48 50 71 75 81 85 95 102 103 113 121 130
$Fold05
[1] 2 9 12 24 42 55 58 67 77 78 109 112 125 131 144
$Fold06
[1] 10 21 38 44 49 60 62 64 83 90 119 120 122 127 142
$Fold07
[1] 7 11 19 25 43 54 70 89 92 93 108 115 118 123 124
$Fold08
[1] 8 14 17 37 41 52 56 65 73 96 135 139 141 145 148
$Fold09
[1] 15 16 22 30 40 69 72 79 87 99 116 117 134 147 150
$Fold10
[1] 13 20 31 34 36 63 68 76 80 94 104 107 110 111 133
The dataset is divided into 10 folds using the createFolds
function from the caret
package.
Step 4: Perform 10-Fold Cross-Validation
Train and evaluate the SVM model using LibSVM for each fold.
R
# Initialize a vector to store accuracy for each fold
accuracies <- c()
# Loop over each fold
for (i in 1:10) {
# Split the data into training and test sets
test_indices <- folds[[i]]
train_data <- X[-test_indices, ]
train_labels <- y[-test_indices]
test_data <- X[test_indices, ]
test_labels <- y[test_indices]
# Train the SVM model using LibSVM
svm_model <- svm(train_data, train_labels, type = 'C-classification',
kernel = 'linear')
summary(svm_model)
Output:
Call:
svm.default(x = train_data, y = train_labels, type = "C-classification",
kernel = "linear")
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1
Number of Support Vectors: 28
( 2 15 11 )
Number of Classes: 3
Levels:
setosa versicolor virginica
For each fold, the model is trained on the training data (9 folds) and evaluated on the test data (1 fold).
Step 5: Evaluate the model performance
Now we will Evaluate the model performance.
R
# Make predictions on the test set
predictions <- predict(svm_model, test_data)
# Calculate accuracy
accuracy <- sum(predictions == test_labels) / length(test_labels)
# Store the accuracy
accuracies <- c(accuracies, accuracy)
# Calculate and print the average accuracy
average_accuracy <- mean(accuracies)
print(paste("Average Accuracy:", round(average_accuracy * 100, 2), "%"))
Output:
[1] "Average Accuracy: 96.67 %"
The average accuracy across all 10 folds is calculated and printed, providing an estimate of the model's performance.
Conclusion
Performing 10-fold cross-validation in R is straightforward with the e1071 package for libsvm and the glmnet package for regularized regression. By following the steps and examples provided in this article, you can ensure that your models are robustly evaluated and optimized.
By integrating cross-validation into your modeling workflow, you can improve the reliability and performance of your predictive models, whether you're working with support vector machines or regularized linear models.
Similar Reads
How to Use K-Fold Cross-Validation in a Neural Network To use K-Fold Cross-Validation in a neural network, you need to perform K-Fold Cross-Validation splits the dataset into K subsets or "folds," where each fold is used as a validation set while the remaining folds are used as training sets. This helps in understanding how the model performs across dif
3 min read
K-fold Cross Validation in R Programming K-Fold Cross Validation is a method used to evaluate a machine learning model by splitting the dataset into K equal parts. The model is trained on K-1 parts and tested on the remaining part, repeating this process K times. The final performance is the average of all test results, offering a more rel
4 min read
How to do nested cross-validation with LASSO in caret or tidymodels? Nested cross-validation is a robust technique used for hyperparameter tuning and model selection. When working with complex models like LASSO (Least Absolute Shrinkage and Selection Operator), it becomes essential to understand how to implement nested cross-validation efficiently. In this article, w
10 min read
How to Deal with Factors with Rare Levels in Cross-Validation in R Cross-validation is a vital technique for evaluating model performance in machine learning. However, traditional cross-validation approaches may lead to biased or unreliable results when dealing with factors (categorical variables) that contain rare levels. In this guide, we'll explore strategies fo
4 min read
Cross Validation on a Dataset with Factors in R Cross-validation is a widely used technique in machine learning and statistical modeling to assess how well a model generalizes to new data. When working with datasets containing factors (categorical variables), it's essential to handle them appropriately during cross-validation to ensure unbiased p
4 min read
Cross-Validation Using K-Fold With Scikit-Learn Cross-validation involves repeatedly splitting data into training and testing sets to evaluate the performance of a machine-learning model. One of the most commonly used cross-validation techniques is K-Fold Cross-Validation. In this article, we will explore the implementation of K-Fold Cross-Valida
12 min read