How to perform 10 fold cross validation with LibSVM in R?
Last Updated :
30 Jul, 2024
Support Vector Machines (SVM) are a powerful tool for classification and regression tasks. LibSVM is a widely used library that implements SVM, and it can be accessed in R with the e1071
package. Cross-validation, particularly 10-fold cross-validation, is an essential technique for assessing the performance and generalizability of a model. This article will explain the theory behind 10-fold cross-validation and demonstrate how to perform it using LibSVM in R Programming Language.
Overview of 10 fold cross validation
Cross-validation is a statistical method used to estimate the skill of a model on unseen data. It is commonly used to assess the effectiveness of machine learning models.
- Cross-Validation: A technique to evaluate the performance of a model by partitioning the data into subsets, training the model on some subsets, and validating it on the remaining subsets.
- K-Fold Cross-Validation: Involves splitting the dataset into K equally-sized folds. The model is trained K times, each time using a different fold as the validation set and the remaining K-1 folds as the training set.
- 10-Fold Cross-Validation: A specific case of K-Fold Cross-Validation where K=10. This is a widely used method that balances computational efficiency and performance estimation accuracy.
Now we will discuss the step-by-step implementation of How to perform 10-fold cross-validation with LibSVM in R Programming Language.
Step 1: Getting Started with e1071 and libsvm
To perform 10-fold cross-validation with libsvm, we need to install and load the e1071 package:
R
install.packages("e1071")
library(e1071)
Step 2: Prepare the Data
We will use the iris
dataset for this example. The dataset will be split into features and labels.
R
# Load the iris dataset
data(iris)
# Split the dataset into features (X) and labels (y)
X <- iris[, -5]
y <- iris$Species
Step 3: Define the Cross-Validation Folds
Create 10 folds for cross-validation using the createFolds
function from the caret
package.
R
# Install and load the caret package
install.packages("caret")
library(caret)
# Create 10 folds
set.seed(123)
folds <- createFolds(y, k = 10, list = TRUE)
folds
Output:
$Fold01
[1] 1 18 26 28 45 57 59 61 98 100 126 128 129 132 143
$Fold02
[1] 5 27 29 39 46 66 74 84 86 97 101 105 106 136 149
$Fold03
[1] 4 6 32 35 47 51 53 82 88 91 114 137 138 140 146
$Fold04
[1] 3 23 33 48 50 71 75 81 85 95 102 103 113 121 130
$Fold05
[1] 2 9 12 24 42 55 58 67 77 78 109 112 125 131 144
$Fold06
[1] 10 21 38 44 49 60 62 64 83 90 119 120 122 127 142
$Fold07
[1] 7 11 19 25 43 54 70 89 92 93 108 115 118 123 124
$Fold08
[1] 8 14 17 37 41 52 56 65 73 96 135 139 141 145 148
$Fold09
[1] 15 16 22 30 40 69 72 79 87 99 116 117 134 147 150
$Fold10
[1] 13 20 31 34 36 63 68 76 80 94 104 107 110 111 133
The dataset is divided into 10 folds using the createFolds
function from the caret
package.
Step 4: Perform 10-Fold Cross-Validation
Train and evaluate the SVM model using LibSVM for each fold.
R
# Initialize a vector to store accuracy for each fold
accuracies <- c()
# Loop over each fold
for (i in 1:10) {
# Split the data into training and test sets
test_indices <- folds[[i]]
train_data <- X[-test_indices, ]
train_labels <- y[-test_indices]
test_data <- X[test_indices, ]
test_labels <- y[test_indices]
# Train the SVM model using LibSVM
svm_model <- svm(train_data, train_labels, type = 'C-classification',
kernel = 'linear')
summary(svm_model)
Output:
Call:
svm.default(x = train_data, y = train_labels, type = "C-classification",
kernel = "linear")
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 1
Number of Support Vectors: 28
( 2 15 11 )
Number of Classes: 3
Levels:
setosa versicolor virginica
For each fold, the model is trained on the training data (9 folds) and evaluated on the test data (1 fold).
Step 5: Evaluate the model performance
Now we will Evaluate the model performance.
R
# Make predictions on the test set
predictions <- predict(svm_model, test_data)
# Calculate accuracy
accuracy <- sum(predictions == test_labels) / length(test_labels)
# Store the accuracy
accuracies <- c(accuracies, accuracy)
# Calculate and print the average accuracy
average_accuracy <- mean(accuracies)
print(paste("Average Accuracy:", round(average_accuracy * 100, 2), "%"))
Output:
[1] "Average Accuracy: 96.67 %"
The average accuracy across all 10 folds is calculated and printed, providing an estimate of the model's performance.
Conclusion
Performing 10-fold cross-validation in R is straightforward with the e1071 package for libsvm and the glmnet package for regularized regression. By following the steps and examples provided in this article, you can ensure that your models are robustly evaluated and optimized.
By integrating cross-validation into your modeling workflow, you can improve the reliability and performance of your predictive models, whether you're working with support vector machines or regularized linear models.
Similar Reads
How to Use K-Fold Cross-Validation in a Neural Network
To use K-Fold Cross-Validation in a neural network, you need to perform K-Fold Cross-Validation splits the dataset into K subsets or "folds," where each fold is used as a validation set while the remaining folds are used as training sets. This helps in understanding how the model performs across dif
3 min read
How to do nested cross-validation with LASSO in caret or tidymodels?
Nested cross-validation is a robust technique used for hyperparameter tuning and model selection. When working with complex models like LASSO (Least Absolute Shrinkage and Selection Operator), it becomes essential to understand how to implement nested cross-validation efficiently. In this article, w
10 min read
How to Deal with Factors with Rare Levels in Cross-Validation in R
Cross-validation is a vital technique for evaluating model performance in machine learning. However, traditional cross-validation approaches may lead to biased or unreliable results when dealing with factors (categorical variables) that contain rare levels. In this guide, we'll explore strategies fo
4 min read
SVM with Cross Validation in R
Support Vector Machine (SVM) is a powerful and versatile machine learning model used for classification and regression tasks. In this article, we'll go through the steps to implement an SVM with cross-validation in R using the caret package. Cross Validation in RCross-validation involves splitting t
4 min read
Cross Validation on a Dataset with Factors in R
Cross-validation is a widely used technique in machine learning and statistical modeling to assess how well a model generalizes to new data. When working with datasets containing factors (categorical variables), it's essential to handle them appropriately during cross-validation to ensure unbiased p
4 min read
Cross-Validation Using K-Fold With Scikit-Learn
Cross-validation involves repeatedly splitting data into training and testing sets to evaluate the performance of a machine-learning model. One of the most commonly used cross-validation techniques is K-Fold Cross-Validation. In this article, we will explore the implementation of K-Fold Cross-Valida
12 min read
Cross validation in R without caret package
Cross-validation is a technique for evaluating the performance of a machine learning model by training it on a subset of the data and evaluating it on the remaining data. It is a useful method for estimating the performance of a model when you don't have a separate test set, or when you want to get
4 min read
Generalisation Performance from NNET in R using k-fold cross-validation
Neural networks are a powerful tool for solving complex machine-learning tasks. However, assessing their performance on new, unseen data is crucial to ensure their reliability. In this tutorial, we'll explore how to evaluate the generalization performance of a neural network implemented using the `n
15+ min read
Cross Validation function for logistic regression in R
Cross-validation is a technique for assessing the performance of a machine-learning model. It helps in understanding how the model generalizes to an independent dataset, thereby ensuring that the model is neither overfitted nor underfitted. This article will guide you through creating a cross-valida
3 min read
K- Fold Cross Validation in Machine Learning
K-Fold Cross Validation is a statistical technique to measure the performance of a machine learning model by dividing the dataset into K subsets of equal size (folds). The model is trained on K â 1 folds and tested on the last fold. This process is repeated K times, with each fold being used as the
4 min read