Gradient Based Particle Swarm
Gradient Based Particle Swarm
a r t i c l e i n f o a b s t r a c t
Article history: Stochastic optimization algorithms like genetic algorithms (GAs) and particle swarm optimization (PSO)
Received 24 September 2010 algorithms perform global optimization but waste computational effort by doing a random search. On
Received in revised form 5 January 2011 the other hand deterministic algorithms like gradient descent converge rapidly but may get stuck in
Accepted 14 August 2011
local minima of multimodal functions. Thus, an approach that combines the strengths of stochastic and
Available online 1 September 2011
deterministic optimization schemes but avoids their weaknesses is of interest. This paper presents a new
hybrid optimization algorithm that combines the PSO algorithm and gradient-based local search algo-
Keywords:
rithms to achieve faster convergence and better accuracy of final solution without getting trapped in local
Particle swarm optimization (PSO)
Gradient descent
minima. In the new gradient-based PSO algorithm, referred to as the GPSO algorithm, the PSO algorithm
Global optimization techniques is used for global exploration and a gradient based scheme is used for accurate local exploration. The
Stochastic optimization global minimum is located by a process of finding progressively better local minima. The GPSO algorithm
avoids the use of inertial weights and constriction coefficients which can cause the PSO algorithm to
converge to a local minimum if improperly chosen. The De Jong test suite of benchmark optimization
problems was used to test the new algorithm and facilitate comparison with the classical PSO algorithm.
The GPSO algorithm is compared to four different refinements of the PSO algorithm from the literature
and shown to converge faster to a significantly more accurate final solution for a variety of benchmark
test functions.
© 2011 Elsevier B.V. All rights reserved.
1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2011.08.037
354 M.M. Noel / Applied Soft Computing 12 (2012) 353–359
(referred to as NM-PSO) is presented in [14]. NM-PSO starts with 0 < w(k) < 1), as in (3) or a constriction coefficient, x < 1, as in (4)
3N+1 particles where N is the dimension of the search space. The [9–11]
entire population is sorted according to their fitness and the best
Vid (k + 1) = w(k)Vid (k) + 1d (gid − Xid (k)) + 2d (Gid − Xid (k))
N+1 particles are updated using the Nelder–Mead Simplex algo- (3)
Xid (k + 1) = Xid (k) + Vid (k)
rithm. The remaining 2N particles are updated using the classical
PSO algorithm. In [14] the NM-PSO is shown to converge faster to Vid (k + 1) = (Vid (k) + 1d (gid − Xid (k)) + 2d (Gid − Xid (k))
a more accurate final solution compared to the Nelder–Mead Sim- (4)
Xid (k + 1) = Xid (k) + Vid (k)
plex and PSO algorithms for a variety of low dimensional (≤10) test
functions. The GPSO algorithm proposed in this paper is compared Use of inertial weight or constriction coefficient essentially nar-
to the NM-PSO algorithm and shown to perform significantly better rows the region of search as the search progresses. In (3) the inertial
in Section 5. weight w(k) is initially set to a value of 1 but is reduced to 0 as the
iterations progress. A value near 1 encourages exploration of new
regions of the search space while a small value encourages exploita-
2. The PSO algorithm tion or detailed search of the current region. In (4) the value of the
constriction coefficient x is less than 1, which reduces the velocity
Consider the following general unconstrained function step size to zero as the iterations progress. This is because when
optimization problem: the velocity is repeatedly multiplied by a factor less than one (x or
w(k)) the velocity becomes small and consequently the region of
minimize f (x1 , x2 , . . . , xN )
(1) search is narrowed.
Where f : RN → R.
However, since the location of the global minimum is not known
Tentative solutions to this problem will be real vectors of length a priori, reducing the step size to a small value too soon might result
N. The PSO algorithm starts with a population of points (also in premature convergence to a local minimum. Also, a step size
referred to as particles) randomly initialized in the search space. that is too small discourages exploration and wastes computational
The particles are moved according to rules inspired by bird flock- effort. Thus, to accurately locate the global minimum without get-
ing behavior. Each particle is moved towards a randomly weighted ting trapped in local minima, the step size must be reduced only in
average of the best position encountered by that particle so far the neighborhood of a tentative global minimum. However, there
and the best position found by the entire population of particles is no procedure that allows the values of the inertial weight or the
according to: constriction coefficient to be set or adjusted over the optimiza-
tion period to manipulate the step size as needed to maintain an
Vid (k + 1) = Vid (k) + 1d (Pid (k) − Xid (k)) + 2d (Gd (k) − Xid (k)) optimal balance between exploration and exploitation in the PSO
(2)
Xid (k + 1) = Xid (k) + Vid (k) algorithm. Proper choice of inertial weights or constriction coeffi-
cients is problematic because the number of iterations necessary
where Vi = [Vi1 Vi2 . . .ViN ] is the velocity for particle i; to locate the global minimum is not known a priori. This paper
Xi = [Xi1 Xi2 . . .XiN ] is the position of particle i; 1d and 2d are presents an approach where the balance between exploration and
uniformly distributed random number and are generated indepen- exploitation is achieved by using the PSO algorithm without con-
dently for each dimension; Pi = [Pi1 Pi2 . . .PiN ] is the best position striction coefficient or inertial weight for global exploration and a
found by particle i; G = [G1 G2 . . .GN ] is the best position found deterministic local search algorithm for accurate location of good
by the entire population; N is the dimension of the search space local minima. This approach is advantageous because it allows for
and k is the iteration number. Vid is constrained to be bounded exploration of new regions of the search space while retaining the
between Vmin and Vmax to prevent divergence of the swarm. When ability to improve good solutions already found.
a particle position is updated the global best solution is replaced
by the new solution if f(Xi ) < f(G). Thus when a particle is updated it 4. A new gradient based PSO (GPSO) algorithm
has the opportunity to learn from all particles previously updated
resulting in a high convergence rate. In a classical gradient descent scheme [12] a point is chosen ran-
In the PSO described above the global best solution found by the domly in the search space and small steps are made in the direction
swarm so far G is used to update all particles. The use of G leads to of the negative of the gradient according to (5).
communication bottlenecks in parallel implementations and might
result in convergence to a local minimum if initial solutions are near X(k + 1) = X(k) − ∇ (C(X(k))) (5)
a local minimum. Other approaches in which the swarm is divided
where is the step size; X(k) is the approximation to the local mini-
into subpopulations and subpopulation bests are used to update
mum at iteration k and (C(X(k))) is the gradient of the cost function
particles have been explored [8], but use of multiple subpopula-
evaluated at X(k). The gradient is the row vector composed of the
tions result in reduced convergence rates. In this paper the classical
first order partial derivatives of the cost function with respect to X.
global version will be used although the approach presented in
The gradient can be computed using a forward difference approx-
this paper can be used with other population topologies as well.
imation for the partial derivatives, or by more advanced methods.
Performance analysis of the PSO algorithm shows that although
With gradient descent, a large will result in fast convergence of
the particles reach the vicinity of the global minimum faster than
X(k) towards the minimum but once in the vicinity of the near-
evolutionary computation techniques the convergence after that
est local minimum oscillations will occur as X(k) overshoots the
is slower [6,7]. This is because the particles are not constrained to
minimum. On the other hand, a small step size will result in slow
take smaller steps as they near the global minimum.
convergence towards the minimum but the final solution will be
more accurate. Since the choice of is problematic, a random step
3. Exploration versus exploitation: techniques to improve size can be used; for example uniformly distributed in an interval
accuracy of the final solution [0,0.5] might be used. Since the negative gradient always points in
the direction of steepest decrease in the function, the nearest local
To accurately locate the global minimum with the PSO algorithm minimum will be reached eventually. Since the gradient is zero at
the step size has to be reduced when particles approach the global a local minimum, smaller steps will automatically be taken when
minimum. This is usually done by incorporating an inertial weight a minimum is approached. Also, movement in a direction other
M.M. Noel / Applied Soft Computing 12 (2012) 353–359 355
Set L = G
Set i = i + 1
YES
i ≤ NP
NO
Set L = L +1
than the direction defined by the negative gradient will result in solution found by the PSO algorithm as its starting point. If the best
a smaller decrease in the value of the cost function. Thus, in the solution found by the PSO algorithm (G) has a larger cost than the
case of a function with a single minimum (unimodal function) the final solution found by local search during the previous iteration
gradient descent algorithm converges faster than stochastic search (L), then L is used as the starting point for the local search. This
algorithms because stochastic search algorithms waste computa- ensures that the local search is done in the neighborhood of the
tional effort doing a random search. For multimodal functions, the best solution found by the GPSO in all previous iterations. Thus, the
gradient descent algorithm will converge to the local minimum PSO algorithm is used to go near the vicinity of a good local minima
nearest to the starting point. In the new GPSO algorithm, the PSO and the gradient descent scheme is used to find the local mini-
algorithm is first used to approximately locate a good local min- mum accurately. Next, this accurately computed local minimum is
imum. Then a gradient based local search is done with the best used as the global best solution in the PSO algorithm to identify
356 M.M. Noel / Applied Soft Computing 12 (2012) 353–359
180 Table 1
Test functions used for comparison of GPSO and PSO algorithms.
160
Test function Range of search Equation
COST OF BEST SOLUTION
140
N
60
N−1
2 2
Rosenbrock [−100,100]N
f3 (x) = 100 xi+1 − xi2 + (1 − xi )
40 i=1
N
20 Rastrigin [−100,100]N f4 (x) = 10N + xi2 − 10 cos(2xi )
i=1
0
0 200 400 600 800 1000 1200 1400 1600 N
N
x
Griewangk [−600,600]N f5 (x) = 1 + 1
xi2 − cos √i
FUNCTION EVALUATIONS 4000
i
i=1 i=1
Fig. 2. Solid line shows convergence of GPSO algorithm with 50 local search iter-
ations done every 10 PSO iterations. Dotted line shows convergence of GPSO
algorithm with 5 local search iterations done every PSO iteration. Total number of 10
10
PSO and local search iterations are 100 and 500 respectively in both cases. Rastrigin’s
test function was used.
5
10
10 5
COST OF BEST SOLUTION 10 10
5
10
0
10
-5
10
-5
10
------ GPSO
___ PSO -10
10 ------ GPSO
10
-10
___
___ PSO
-15 -15
10 10
0 2000 4000 6000 8000 10000 12000 0 500 1000 1500 2000 2500
FUNCTION EVALUATIONS FUNCTION EVALUATIONS
Fig. 4. Mean best solution versus number of function evaluations for the Ellipsoid Fig. 7. Mean best solution versus number of function evaluations for the
test function. Griewangk’s test function.
15
10
functions. These test functions have steep gradients throughout
10
the solution space, and the GPSO converged significantly faster
10 to a better solution than the PSO algorithm because the gradient
COST OF BEST SOLUTION
5
10 deep local minima) found by the PSO are refined using a determin-
istic gradient based local search technique with high convergence
rate avoiding costly random search.
0 The GPSO was then compared to a hybrid PSO algorithm
10
that employs the PSO for exploration and the derivative free
Nelder–Mead method for exploitation (NM-PSO) [14]. Table 3
shows the test functions used for comparison of the GPSO and
-5
10 ------ GPSO NM-PSO algorithms.
___ PSO Table 4 compares the average performance of the GPSO and NM-
PSO algorithms for 100 independent runs. Table 4 shows that the
-10 GPSO algorithm converged to a significantly better solution than
10
0 200 400 600 800 1000 1200 the NM-PSO algorithm [14] in fewer function evaluations for the
FUNCTION EVALUATIONS Sphere, Rosenbrock, Griewangk, B2 and Zakharov test functions.
For the Goldstein–Pierce and Easom test functions the GPSO con-
Fig. 6. Mean best solution versus number of function evaluations for the Rastrigin verged to a more accurate solution but required more function
test function. evaluations. The NM Simplex algorithm used in the NM-PSO algo-
rithm is computationally more efficient than the QNR algorithm
exploring regions with steep gradients since gradient information used in the GPSO algorithms for lower dimensions. For higher
is not used. dimensional problems the gradient based QNR algorithm performs
Figs. 6 and 7 show the performance of the GPSO and PSO significantly better than the NM Simplex algorithm. In accordance
algorithms for the multimodal Rastrigin and Griewangk test with this observation, the results presented in [14] indicate that
358 M.M. Noel / Applied Soft Computing 12 (2012) 353–359
Table 2
Mean best solution found in 50 independent trials.
Table 3
Test functions used for comparison of GPSO and NM-PSO algorithms.
Easom [−100,100]2 ES(x) = − cos(x1 )cos(x2 ) × exp(− ((x1 − )2 + (x2 − )2 ))
2
2
Goldstein & [−100,100]2 GP(x) = 1 + (x1 + x2 + 1) 19 − 14x1 + 3x12 − 14x2 + 6x1 x2 + 3x22 × 30 + (2x1 − 3x2 ) 18 − 32x1 + 12x12 + 48x2 − 36x1 x2 + 27x22
Price
N 2
N 4
N
Zakharov [−600,600]N
ZN (x) = xj2 + 0.5jxj2 + 0.5jxj2
j=1 j=1 j=1
Table 4
Comparison of the performance of GPSO and NM-PSO algorithms.
Test function Dim. Mean function Mean function evaluations Mean error Mean error
evaluations GPSO NM-PSO [14] NM-PSO GPSO
[14]
the performance of the NM-PSO algorithm degrades with increas- ance of wastage of computation time in stochastic search (when
ing dimension. Tables 2 and 4 show the quality of final solution is local minima are not present). In the GPSO algorithm, a gradient
significantly better for all test cases although the GPSO algorithm descent scheme is used for accurate local exploration around the
required more function evaluations in a minority of cases. best solution to complement the global exploration provided by
Costly deterministic local search is done only around the global the PSO. This approach allows an accurate final solution to be com-
best solution in case of the GPSO algorithm while in case of the puted while retaining the ability to explore better solutions. The
NM-PSO algorithm roughly one third of the particles perform local GPSO algorithm was compared to four different refinements of the
search. PSO algorithm from literature and shown to perform significantly
The NM-PSO algorithm requires 3N+1 particles to solve an N better for a variety of test functions. The GPSO algorithm avoids
dimensional problem making it computationally inefficient for the use of constriction coefficients and inertial weights which, if
higher dimensions. Thus, to optimize a test function of dimen- improperly chosen can result in premature convergence to a local
sion 30, the NM-PSO requires 91 particles. On the other hand, minimum for the PSO algorithm.
Tables 1 and 2 show that good results can be obtained with the Parallel implementations of the GPSO algorithm will be consid-
GPSO algorithm while using only 20 particles. ered in future work. One strategy for parallelization is to divide
the population into subpopulations that explore the search space
in parallel. The best solution found by each subpopulation can
6. Conclusion be communicated to a central computation node periodically.
This population of best solutions can be evolved using the GPSO
In this paper, a new hybrid optimization algorithm, referred to algorithm.
as the GPSO algorithm, that combines the stochastic PSO algorithm
and the deterministic gradient descent algorithm is presented. The
References
key ideas explored are the use of the PSO to provide an initial point
within the domain of convergence of the quasi-Newton algorithm, [1] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learn-
synergistic interaction between local and global search and avoid- ing, Addison-Wesley, Reading, MA, 1989.
M.M. Noel / Applied Soft Computing 12 (2012) 353–359 359
[2] T. Bäck, Evolutionary Algorithms in Theory and Practice, Oxford Univ. Press, [9] R.C. Eberhart, Y. Shi, A modified particle swarm optimizer, in: Proceedings of
New York, 1996. the IEEE International Conference on Evolutionary Computation, vol. 6, 1998,
[3] J. Kennedy, R.C. Eberhart, Swarm Intelligence, Morgan Kaufmann Academic pp. 9–73.
Press, 2001. [10] M. Clerc, J. Kennedy, The particle swarm: exploration, stability and convergence
[4] R.C. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in: Pro- in a multi-dimensional complex space, in: Proceedings of the IEEE Interna-
ceedings of the Sixth International Symposium on Micro Machines and Human tional Conference on Evolutionary Computation, vol. 6, February, 2002, pp.
Science, 4–6 October, 1995, pp. 39–43. 58–73.
[5] R.C. Eberhart, Y. Shi, Particle swarm optimization: applications and resources, [11] R.C. Eberhart, Y. Shi, Parameter selection in particle swarm optimization,
in: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 1, in: Evolutionary Programming VII, Lecture Notes in Computer Science 1447,
27–30 May, 2001, pp. 81–86. Springer, 1998, pp. 591–600.
[6] P.J. Angeline, Evolutionary optimization versus particle swarm optimization: [12] R. Horst, H. Tuy, Global Optimization—Deterministic Approaches, Springer-
philosophy and performance differences, in: Evolutionary Programming VII, Verlag, New York, 1996.
Lecture Notes in Computer Science 1447, Springer, 1998, pp. 601–610. [13] A. Ratanaweera, S.K. Halgamuge, H.C. Watson, Self-organizing hierar-
[7] R.C. Eberhart, Y. Shi, Comparison between genetic algorithms and particle chical particle swarm optimizer with time-varying acceleration coeffi-
swarm optimization, in: Evolutionary Programming VII, Lecture Notes in Com- cients, IEEE Transactions on Evolutionary Computation 8 (June) (2004)
puter Science 1447, Springer, 1998, pp. 611–616. 240–255.
[8] J. Kennedy, R. Mendes, Population structure and particle swarm performance, [14] S.K. Fan, Y.C. Liang, E. Zahara, Hybrid simplex search and particle swarm opti-
in: Proceedings of the 2002 Congress on Evolutionary Computation, vol. 2, mization for the global optimization of multimodal functions, Engineering
12–17 May, 2002, pp. 1671–1676. Optimization 36 (August (4)) (2004) 401–418.