SVM Model Selection Using PSO For Learning Handwritten
SVM Model Selection Using PSO For Learning Handwritten
995-1008, 2019
Abstract: Using Support Vector Machine (SVM) requires the selection of several
parameters such as multi-class strategy type (one-against-all or one-against-one), the
regularization parameter C, kernel function and their parameters. The choice of these
parameters has a great influence on the performance of the final classifier. This paper
considers the grid search method and the particle swarm optimization (PSO) technique
that have allowed to quickly select and scan a large space of SVM parameters. A
comparative study of the SVM models is also presented to examine the convergence
speed and the results of each model. SVM is applied to handwritten Arabic characters
learning, with a database containing 4840 Arabic characters in their different positions
(isolated, beginning, middle and end). Some very promising results have been achieved.
1 Introduction
Research on Arabic characters recognition reveals a rapidly expanding field and is now a
concern whose relevance is undisputed by the research community, which has devoted its
efforts to reducing constraints and expanding the field of Arabic character recognition.
Among the techniques used for Arabic handwriting recognition is the SVM introduced in
the early 1990s by Boser et al. [Boser, Guyon and Vapnik (1992); Cortes and Vapnik
(1995)], which has been very successful in many areas of machine learning. Today, it can
be said without exaggeration that these machines have replaced neural networks and
other learning techniques.
The adjustment of the hyper-parameters of the SVM classifier is a crucial step in building an
effective recognition system. For a long time, the model selection was carried out by a “grid
search” method, where a systematic search is implemented by discretizing the parameter
space using a fixed step [Xiao, Ren, Lei et al. (2014); Wojciech, Sabina and Andrzej (2015)].
More recently, model selection has been considered as an optimization task. In this
context, an optimization algorithm is implemented in order to find all the hyper-
parameters that achieve the best classification performance. Among the existing
1 Département Informatique Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf USTO-
MB, BP 1505 El M’naoeur, 31000, Oran, Algérie.
* Corresponding Author: Mamouni El Mamoun. Email: [email protected].
optimization algorithms, the gradient descent method has often been used for SVM
model selection [Ayat, Cheriet and Suen (2005); Jiang and Siddiqui (2019)].
Metaheuristic techniques were also used for SVM model selection. Genetic algorithms
[Sun, Guo, Wang et al. (2017); Phan, Nguyen and Bui (2017)], evolutionary strategies
[Liu, Liu, Yang et al. (2006); Phienthrakul and Kijsirikul (2010)] and taboo search
metaheuristic [Zennaki, Mamouni and Sadouni (2013); Corazza, Di Martino, Ferrucci et
al. (2013)] were used to find the best configuration of SVM parameters.
In this work, the PSO technique was adapted for parameter selection in order to
maximize the cross validation accuracy and a comparative study between different SVM
models is presented.
The rest of the article is organized as follows. Section 2 presents the SVM and the two multi-
class approaches one-against-all and one-against-one. The PSO method is described in
Section 3. Section 4 provides a brief description of the proposed recognition system. Section
5 describes the experimental results. Finally, Section 6 draws the conclusions of this study.
Classe: +1
Classe: -1
Figure 1: Representation of the hyperplane separating the data in the variables space
SVM Model Selection Using PSO for Learning Handwritten Arabic Characters 997
2.1 One-against-all
This is the simplest and oldest method. According to Vapnik’s formulation [Vapnik
(1998)], it is a question of determining for each class k a hyperplane 𝐻𝐻𝑘𝑘 (𝑤𝑤𝑘𝑘 , 𝑏𝑏𝑘𝑘 )
separating it from all other classes. This class k is considered as the positive class (+1)
and the other classes as the negative class (-1) so for a problem of K classes, K SVM
binary is obtained. Fig. 2 shows a case of separation of three classes.
2.2 One-against-one
This approach consists in using a classifier for each two classes. This method discriminates
each class of every other class, thus K(K-1)/2 decision functions are learned.
For each pair of classes (k, s), this method defines a binary decision function. The
assignment of a new example is done by voting list. An example is tested by calculating
its decision function for each hyperplane. For each test, there is a vote for the class to
which the example belongs (winning class).
t=0
Repeat
For i = 1 To N
Calculate the cross validation accuracy of SVM with parameters 𝑥𝑥𝑖𝑖
If accuracy of SVM with parameters 𝑥𝑥𝑖𝑖 greater then accuracy of SVM with parameters
𝑝𝑝𝑖𝑖 Then
Update 𝑝𝑝𝑖𝑖 = 𝑥𝑥𝑖𝑖
If accuracy of SVM with parameters 𝑝𝑝𝑖𝑖 greater then accuracy of SVM with parameters
𝑔𝑔 Then
Update 𝑔𝑔 = 𝑝𝑝𝑖𝑖
End for
For i = 1 To N
Update 𝑥𝑥𝑖𝑖 using Eqs. (7) and (8)
End for
t=t+1
Until ( 𝑡𝑡 > 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚 )
Return 𝑔𝑔
4 Recognition system of handwritten Arabic characters
In the context of this study, a character recognition system has been developed. This section
provides a brief description of the database and the technique used to extract the features.
A consistent database was built in SIMPA laboratory with the contribution of several
researchers and students, containing 4840 examples of handwritten Arabic characters, as
shown in Fig. 4. The letters are in different positions (isolated, beginning, middle and
end), for isolated letters there are 100×28=2800 images and for others there are 30×
68=2040 images. The number of classes is 28+68=96.
19 69 80 75 65 60 45
29 40 75 90 68 50 22
21 69 24 3 0 0 0
72 1 0 49 80 2 0
88 3 0 0 9 0 0
65 83 43 21 10 10 10
3 44 75 90 98 91 42
100
80 80-100
60
60-80
40
20 15 40-60
10
0 5 20-40
-15
-13
0
-11
-9
0-20
-7
-5
-3
-1
1
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝐶𝐶)
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝛾𝛾)
Figure 6: Grid search results for RBF kernel for one-against-one approach
1002 CMC, vol.61, no.3, pp.995-1008, 2019
-5
-3
-1
1
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝐶𝐶)
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝛾𝛾) 0-20
Figure 7: Grid search results for Laplacian kernel for one-against-one approach
For one-against-all approach, the results obtained are illustrated as follows:
100
80 80-100
60
60-80
40
20 15 40-60
10
0 5 20-40
-15
-13
0
-11
0-20
-9
-7
-5
-3
-1
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝐶𝐶)
1
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝛾𝛾)
Figure 8: Grid search results for RBF kernel for one-against-all approach
SVM Model Selection Using PSO for Learning Handwritten Arabic Characters 1003
-3
-1
1
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝐶𝐶)
𝑙𝑙𝑙𝑙𝑙𝑙10 (𝛾𝛾) 0-20
Figure 9: Grid search results for Laplacian kernel for one-against-all approach
According to these results, the surface area of the good results (Accuracy >90%) of
Laplacian kernel is large compared to the RBF kernel, and the latter gave the best results.
The same can be said about the two SVM approaches used, the surface of the good
results of the one-against-all approach is large compared to the one-against-one, and the
results of the latter are the best.
During these experiments, the CPU time (duration) was measured at each iteration, the
results obtained are presented in Fig. 10. The results show that the one-against-one approach
(1×1) is faster than the other approach (1×N) and the Laplacian kernel is faster than the RBF.
The one-against-all approach becomes very slow when the γ parameter takes large values
such as 1, 10 and 100. This is evident in Fig. 10 the iteration 18 and their multiples.
40000
35000
Time duration (s)
30000
25000
Laplacien 1x1
20000
15000 RBF 1x1
10000 RBF 1xN
5000
Laplacien 1xN
0
105
118
131
144
157
170
183
196
209
222
235
248
261
274
287
1
14
27
40
53
66
79
92
Iterations
Then, for the selection of SVM model parameters, the PSO method was used, in
particular the gbest model [Frans (2006)]. For these experiments, 5 particles with
coefficient values were used: ω = 0.7298 and C1 = C2 = 1.4962 , to ensure
convergence [Dang and Luong (2011)].
Each particle encodes the SVM model parameters; the number of these parameters is
varied according to the type of kernel. The values considered are 𝐶𝐶 ∈ [100 , 1015 ], 𝛾𝛾 ∈
[10−15 , 102 ], 𝑐𝑐𝑐𝑐𝑒𝑒𝑓𝑓 ∈ [−10, 102 ] 𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑 ∈ [1,10 ]. If a particle exceeds the search space
boundary, this particle will be repositioned at the boundary and its velocity is set to zero.
The results obtained for one-against-one approach are expressed as follows:
100
cross validation accuracy (%)
80
Sigmoid
60 Laplacian
40 Polynomial
Linear
20
Gaussian
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iterations
6000
5000
Time duration (s)
4000 Sigmoid
3000 Laplacian
2000 Polynomial
1000 Linear
0 Gaussian
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Iterations
Figure 12: Time duration in seconds of each kernel for one-against-one approach
For one-against-all approach, the results obtained are expressed as follows:
100
cross validation accuracy (%)
90
80 Sigmoid
70
60 Laplacian
50 Polynomial
40
30 Linear
20 Gaussian
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Iterations
6000
5000
Time duration (s)
Sigmoid
4000
Laplacian
3000
Polynomial
2000 Linear
1000 Gaussian
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Iterations
Figure 14: Time duration in seconds of each kernel for one-against-all approach
According to the results obtained, the linear kernel converges rapidly towards the
optimum, because the SVM model has only one parameter (C), and the value of this
parameter does not significantly affect the model’s behavior. On the other hand, the
sigmoid kernel is the slowest because in this case, there are three parameters and the
model is very sensitive to the change of these parameters. The best result in terms of
recognition rate is for the RBF kernel in one-against-one approach.
Finally, it can be noted that Grid Search is an effective technique to provide an overview
of the search space (allows extracting promising regions), so to find good results it is
necessary to use a local search or to launch the Grid Search a second time in the
promising area. This method requires a lot of time to explore the search space. On the
other hand, the PSO produces good results quickly.
6 Conclusions
Based on the experimental results obtained, the use of SVM approach for Arabic
character recognition is strongly recommended due to its superior generalization ability
to classify high-dimensional data, even when there is a large number of classes (in this
case study: 96 classes).
It is also important to note that the RBF kernel is the most suitable for the recognition of
Arabic handwritten characters. Indeed this kernel has better results than the other kernels
with a cross validation accuracy of 95, 49% (SVM one-against-one, 𝛾𝛾 = 1,75E − 05). As
a result, the PSO is more effective than Grid search in selecting SVM models.
References
Anitha Mary, M. O. C.; Dhanya, P. M. (2015): A comparative study of different
feature extraction techniques for offline malayalam character recognition. Computational
Intelligence in Data Mining, vol. 2, pp. 9-18.
Ayat, N. E.; Cheriet, M.; Suen, C. Y. (2005): Automatic model selection for the
optimization of SVM kernels. Pattern Recognition, vol. 38, no. 10, pp. 1733-1745.
SVM Model Selection Using PSO for Learning Handwritten Arabic Characters 1007
Boser, B.; Guyon, I.; Vapnik, V. (1992): A training algorithm for optimal margin
classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning
Theory, pp. 144-152.
Corazza, A.; Di Martino, S.; Ferrucci, F.; Gravino, C.; Sarro, F. et al. (2013): Using
tabu search to configure support vector regression for effort estimation. Empirical
Software Engineering, vol. 18, no. 3, pp. 506-546.
Cortes, C.; Vapnik, V. (1995): Support-vector networks. Machine Learning, vol. 20, no.
3, pp. 273-297.
Dang, H. N.; Luong, C. M. (2011): A new model of particle swarm optimization for
model selection of support vector machine. New Challenges for Intelligent Information
and Database Systems, pp. 167-173.
Dinesh, P. M.; Sabenian, R. S. (2017): Comparative analysis of zoning approaches for
recognition of Indo Aryan language using SVM classifier. Cluster Computing, pp. 1-8.
Frans, V. (2006): An Analysis of Particle Swarm Optimizers (Ph.D. Thesis). University
of Pretoria, Afrique du Sud.
Jiang, W.; Siddiqui, S. (2019): Hyper-parameter optimization for support vector
machines using stochastic gradient descent and dual coordinate descent. EURO Journal
on Computational Optimization, vol. 7, no. 1, pp. 1-17.
Joachims, T. (2008): SVMLight: Support Vector Machine. Cornell University.
http://svmlight.joachims.org/.
Kennedy, J.; Eberhart, R. (1995): Particle swarm optimization. Proceedings of the
IEEE International Conference on Neural Networks, pp. 1942-1948.
Li, J.; Li, B. (2014): Parameters selection for support vector machine based on particle
swarm optimization. International Conference on Intelligent Computing, pp. 41-47.
Liu, R.; Liu, E.; Yang, J.; Li, M.; Wang, F. (2006): Optimizing the hyper-parameters
for SVM by combining evolution strategies with a grid search. Intelligent Control and
Automation, pp. 712-721.
Phan, A. V.; Nguyen, M. L.; Bui, L. T. (2017): Feature weighting and SVM parameters
optimization based on genetic algorithms for classification problems. Applied
Intelligence, vol. 46, no. 2, pp. 455-469.
Phienthrakul, T.; Kijsirikul, B. ( 2010): Evolutionary strategies for hyperparameters of
support vector machines based on multi-scale radial basis function kernels. Soft
Computing, vol. 14, no. 7, pp. 681-699.
Sun, Y.; Guo, L.; Wang, Y.; Ma, Z.; Jin, S. (2017): The comparison of optimizing
SVM by GA and grid search. Proceedings of the IEEE International Conference on
Electronic Measurement & Instruments, pp. 354-360.
Vapnik, V. (1998): Statistical Learning Theory. John Wiley & Sons Inc.
Wojciech, M. C.; Sabina, P.; Andrzej, J. B. (2015): Robust optimization of SVM
hyperparameters in the classification of bioactive compounds. Journal of Cheminformatics,
vol. 7, no. 38, pp. 1-15.
1008 CMC, vol.61, no.3, pp.995-1008, 2019
Xiao, T.; Ren, D.; Lei, S.; Zhang, J.; Liu, X. (2014): Based on grid-search and PSO
parameter optimization for support vector machine. Proceedings of the IEEE World
Congress on Intelligent Control and Automation, pp. 1529-1533.
Zennaki, M.; Mamouni, E. M.; Sadouni, K. (2013): A comparative study of SVM
models for learning handwritten Arabic characters. WSEAS Transactions on Advances in
Engineering Education, vol. 10, no. 1, pp. 32-43.