0% found this document useful (0 votes)
7 views

A Hierarchical Particle Swarm Optimizer and Its Adaptive Variant

Uploaded by

nyachadedisvi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A Hierarchical Particle Swarm Optimizer and Its Adaptive Variant

Uploaded by

nyachadedisvi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1272 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO.

6, DECEMBER 2005

A Hierarchical Particle Swarm Optimizer


and Its Adaptive Variant
Stefan Janson and Martin Middendorf, Member, IEEE

Abstract—A hierarchical version of the particle swarm optimiza- space according to its velocity vector. In every iteration, the ve-
tion (PSO) metaheuristic is introduced in this paper. In the new locity vector is adjusted so that prior personal successful po-
method called H-PSO, the particles are arranged in a dynamic hi- sitions (cognitive aspect) and the best position found by parti-
erarchy that is used to define a neighborhood structure. Depending
on the quality of their so-far best-found solution, the particles move cles within a specific neighborhood (social aspect) act as attrac-
up or down the hierarchy. This gives good particles that move up tors. In this paper we concentrate on PSO for continuous search
in the hierarchy a larger influence on the swarm. We introduce spaces but it should be mentioned that PSO has also been ap-
a variant of H-PSO, in which the shape of the hierarchy is dy- plied to discrete optimization problems (e.g., [13]).
namically adapted during the execution of the algorithm. Another In the original PSO algorithm, the neighborhood of a par-
variant is to assign different behavior to the individual particles
with respect to their level in the hierarchy. H-PSO and its variants ticle consists of all particles, so that the global best position,
are tested on a commonly used set of optimization functions and are i.e., the best solution found so far, directly influences its be-
compared to PSO using different standard neighborhood schemes. havior. Several authors have investigated the use of restricted
Index Terms—Author, please supply your own keywords or send neighborhoods. In [14], several fixed neighborhoods including
a blank e-mail to [email protected] to receive a list of suggested random neighborhoods have been studied. Dynamic neighbor-
keywords. hoods in which the neighborhood of a particle was defined as the
closest individuals in every iteration have been investigated in
[15]. The use of such a dynamic neighborhood is computation-
I. INTRODUCTION
ally intensive because the neighborhood has to be determined

T HE particle swarm optimization (PSO) method for func-


tion optimization has been introduced by Kennedy and
Eberhart in [1] and is inspired by the emergent motion of a
anew at every iteration. In [16], the particles have been clustered
in every iteration and the centroid of the cluster was used as an
attractor instead of using the position of a single individual.
flock of birds searching for food. Like in other optimization In this paper, we propose a hierarchical version of PSO
metaheuristics ([2]), as simulated annealing ([3], [4]), evolu- (H-PSO). In H-PSO, a particle is influenced by its own so far
tionary algorithms ([5]–[8]), or ant colony optimization (ACO) best position and by the best position of the particle that is
([9]–[11]), the search for an optimum is an iterative process directly above it in the hierarchy. In H-PSO, all particles are
that is based on random decisions. Another similarity of PSO arranged in a tree that forms the hierarchy so that each node
to evolutionary algorithms and the population based version of the tree contains exactly one particle. In order to give the
of ACO ([12]) is that a population of solutions or agents is best particles in the swarm a high influence, particles move
used, which cooperate in finding better solutions. Evolutionary up and down the hierarchy. If a particle at a child node has
algorithms use principles of natural evolution like mutation, found a solution that is better than the best so far solution of the
crossover and selection to obtain better solutions from the ac- particle at the parent node, the two particles are exchanged. We
tual population of solutions. ACO is inspired by the foraging be- also introduce variants of H-PSO in which the structure of the
havior of ants, which find short paths to food sources by marking hierarchy is dynamically changed in order to further improve
their paths with pheromones. In ACO, a new solution is cre- the search success. Moreover, variants of H-PSO are described
ated by a simple agent, called ant, that uses a constructive solu- in which the behavior of a particle is determined by its position
tion generation method. The decisions of the ant are guided by in the hierarchy.
artificial pheromone information that stems from former ants The paper is organized as follows. The PSO method is ex-
that have found good solutions. A PSO algorithm iteratively ex- plained in Section II. The hierarchical PSO algorithm and its
plores a multidimensional search space with a swarm of individ- variants are described in Section III. In Section IV, the setup for
uals, that are referred to as particles, looking for the global min- the experiments is described and in Section V, the results are
imum (or maximum). Each particle “flies” through the search presented and discussed. A conclusion is given in Section VI.

II. PSO
Manuscript received September 7, 2004; revised December 21, 2004 and Jan-
uary 28, 2005. This work was supported by the German Research Foundation In this section, we describe the PSO method for function opti-
(DFG) through the project “Swarm Intelligence on Reconfigurable Architec-
tures”. This paper was recommended by Associate Editor M. Dorigo. mization (see also [1]). PSO is an iterative method that is based
The authors are with the Parallel Computing and Complex Systems Group, on the search behavior of a swarm of particles in a multi-
Department of Computer Science, University of Leipzig, D-04109 Leipzig, Ger- dimensional search space. In each iteration the velocities and
many (e-mail: [email protected]; [email protected]
leipzig.de). positions of all the particles are updated. For each particle ,
Digital Object Identifier 10.1109/TSMCB.2005.850530 its velocity vector is updated according to (1). The inertia
1083-4419/$20.00 © 2005 IEEE

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
JANSON AND MIDDENDORF: A HIERARCHICAL PARTICLE SWARM OPTIMIZER AND ITS ADAPTIVE VARIANT 1273

weight controls the influence of the previous velocity the diversity of the algorithm to not being trapped in a local
vector. The current position of the particle is denoted by . Pa- minimum.
rameter controls the impact of the personal best position In [15], a neighborhood scheme has been explored that is de-
, i.e., the position where the particle found the smallest func- fined by a particle’s actual position in the search space. A certain
tion value so far—assuming that the objective function has to be number of close particles are considered to be neighbors. This
minimized. Parameter determines the impact of the best method is computationally intensive, since in every iteration the
position that has been found so far by any of the particles in distances between all pairs of particles have to be calculated.
the neighborhood of particle . Usually and are set to the A fitness-distance-ratio PSO (FDR-PSO) has been proposed
same value. Random values and are drawn with uniform in [23], in which each particle is not only influenced by the
probability from [0,1] for each particle at every iteration. personal and global best position but also by a close and good
After all the particles’ velocities have been updated, the par- neighbor. This neighbor is selected as the particle that maxi-
ticles move with their new velocity to their new positions (2). mizes the quotient of fitness improvement over the respective
Then, for each particle the objective function is evaluated distance. Thus, any nearby improving neighbor of a particle
at its new position. If the personal best can be preferred to the global best particle, provided it is close
position is updated accordingly, i.e., is set to enough.
In [14] and [24], several neighborhood topologies or “so-
ciometries” have been examined for the PSO algorithm. In [24]
these topologies have also been applied to the fully informed
(1) PSO, in which each particle is influenced by all of its neigh-
(2) bors and not only its best neighbor. The information flow within
the swarm is controlled by the two parameters and , where
Several variations of this basic PSO scheme have been pro- gives the number of neighbors of a specific node in the neighbor-
posed in the literature. Commonly used variations are to restrict hood graph and is used to measure the clustering among the
the velocity of a particle by a maximal value or to linearly nodes in the neighborhood graph, i.e., to what extent the respec-
decrease over time [17]. This is done to adjust the swarm’s tive neighborhoods differ. The previously introduced neighbor-
behavior from exploration of the entire search space to exploita- hoods gbest and lbest, the star—one central node is connected
tion of promising regions. to all the other nodes—and other regular neighborhood graphs
Various mechanisms have been designed to increase the di- have been compared to randomly created neighborhood graphs
versity among the particles of a swarm. In [18] a spatial exten- with different values for and . The von Neumann neighbor-
sion is assigned to the particles and different collision strate- hood on a two-dimensional (2-D) lattice performed very good
gies are used to avoid crowding of the swarm. A charged swarm and also a three-dimensional (3-D) pyramid did perform reason-
(CPSO) is proposed in [19], where some or all the particles ably well. The common gbest, lbest and star topologies all have
hold a certain charge and a repulsion force is applied if two been rated worse.
particles get too close to each other. The ARPSO algorithm
[20] switches between phases of attraction and repulsion. If the
swarm becomes too tight, i.e., the diversity diminishes, the re- III. HIERARCHICAL PSO
pulsion phase is initiated and the swarm is scattered. In [21], a The hierarchical version of PSO (H-PSO) is introduced in this
predator particle is introduced that pursues the global best par- section. In H-PSO, all particles are arranged in a hierarchy that
ticle and thereby chases other particles away.
defines the neighborhood structure. Each particle is neighbored
to itself and its parent in the hierarchy. In this paper, we study
A. Neighborhood Topologies
regular tree like hierarchies, i.e., the underlying topology is a
Different neighborhood topologies have been investigated (nearly) regular tree. The hierarchy is defined by the height ,
for PSO. In the original PSO algorithm—here called the gbest the branching degree , i.e., the maximum number of children
model—the swarm is guided by the current global best particle, of the inner nodes, and the total number of nodes of the corre-
i.e., in (1) is the best solution found so far by the swarm. The sponding tree (for an example, see Fig. 1). In this paper we use
gbest model corresponds to a fully connected neighborhood. only hierarchies in which all inner nodes have the same number
In [22], other neighborhood topologies, varying the degree of of children, only the inner nodes on the deepest level might have
interconnections between the particles, have been introduced. a smaller number of children so that the maximum difference
In this paper, we also consider the lbest model, that uses the between the number of children of inner nodes on the deepest
local neighborhood best position to update a particle’s velocity. level is at most one.
The local neighborhood is defined by a ring topology through In order to give the best individuals in the swarm a high influ-
the particle’s index, so that particle is neighbored to particles ence, particles move up and down the hierarchy. In every iter-
and , where is the total number ation, after the evaluations of the objective function at the par-
of particles. It has been shown [22] that the relative performance ticles actual positions, but before the update of the velocities
of gbest and lbest depends on the type of the optimization func- and the determination of the new positions in the search space,
tion. In general, gbest performs better on unimodal functions, as the new positions of the particles within the hierarchy are de-
Sphere and Rosenbrock, whereas lbest is better suited for mul- termined as follows. For every particle in a node of the tree,
timodal functions, in which the optimization success relies on its own best solution is compared to the best solution found by

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
1274 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER 2005

Fig. 1. Example of a hierarchy defined by a regular tree with h = 3, d = 4,


m = 21.

the particles in the child nodes. If the best of these particles, say
particle , is better (i.e., ) then particles and
swap their places within the hierarchy. These comparisons are
performed starting from the top of the hierarchy and then pro-
ceed in a breadth-first manner down the tree. Observe that the
top-down approach implies that in one iteration an individual
can move down several levels in the hierarchy but it can move
up at most one level. The current global best particle will move
up one level of the hierarchy at every iteration. Hence, it will be
on top of the hierarchy after at most iterations—unless, a
better solution was found meanwhile. Fig. 2. Example for adapting the hierarchy of AH-PSO (m = 20) from d = 4
to d = 3.
For the update of the velocities in H-PSO, a particle is influ-
enced by its own so far best position and by the best position of
the individual that is directly above in the hierarchy. This means diameter similar to the gbest neighborhood, that, in general, op-
that for particle the value of in (1) equals where is the timizes faster than the PSO algorithm using the lbest neighbor-
particle in the parent node of particle . Only when particle is hood. On the other hand, the quality increase of the best solu-
tions found by H-PSO with a smaller might be slower in the
in the root of the tree, H-PSO uses .
beginning of the optimization process but it might improve the
Similar as in PSO, after the particles’ velocities are updated
objective function value further in the end of the optimization
and after the particles have moved in H-PSO, the objective func-
process (see Section V-A).
tion is evaluated at the new position. If the function value at this
position is better than the function value at the personal best po- These expectations have inspired the idea to dynamically
sition, the new position is stored in . change the branching degree of the hierarchy during a run. A
related idea has been used in [15] where the neighborhood of
A. Neighborhood the particles was gradually increased from lbest to gbest. Sur-
prisingly, the opposite direction, which might be more effective,
We propose a new neighborhood scheme for PSO that uses has not been tested. In the Adaptive H-PSO (AH-PSO) algo-
the particle’s so far best found function value to define the neigh- rithm that is proposed here, the branching degree is gradually
borhood relations. This approach is similar to the lbest model in decreased during a run of the algorithm. In order to decrease
the fact that only a certain fraction of the swarm is considered the branching degree from to , the hierarchy is traversed
for the velocity update of a particle. But in our algorithm the starting at the root node. This is done so that always one of
neighborhoods are constantly changing, according to the fitness the direct subtrees below the considered node is removed, if
development of the individuals. the number of children exceeds the new required branching
The changing arrangement of the particles can help pre- degree (see Fig. 2 and the next paragraph for an example). The
serving diversity in the search. In the described hierarchy decision about which subtree to remove is based on the quality
the arrangement of the particles leads to a different influence of the particles in the topmost nodes of all subtrees of the
for the particles at different positions. The particle with the considered nodes, i.e., all children of the considered node. This
currently best found solution can (indirectly) influence all the procedure is repeated for the entire tree. After this first phase
other particles after it has reached the top of the hierarchy. This of the branching degree reduction procedure the remaining tree
characteristic of H-PSO is similar to the gbest model. is of branching degree but has fewer nodes than before.
The removed nodes are then evenly inserted at the bottom of
B. Adapting the Hierarchy the hierarchy. Starting with the node on the second to last level
It is to be expected that the structure of the hierarchy, i.e., which has the least number of successors. The removed nodes
the branching degree , has a significant influence on the opti- are appended one by one so that the number of children of
mization behavior of H-PSO. For example, H-PSO with a high all nodes on the second last level differ by at most one. If all
branching degree might perform better in the beginning of the of these nodes have children, a new level is added to
optimization process because all particles are close to the top the hierarchy and the procedure is continued until all removed
particle in the hierarchy. Moreover, this tree topology has a small nodes are reinserted.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
JANSON AND MIDDENDORF: A HIERARCHICAL PARTICLE SWARM OPTIMIZER AND ITS ADAPTIVE VARIANT 1275

As an example, consider Fig. 2, where a hierarchy of 20 TABLE I


nodes and branching degree is shown before and after TEST FUNCTIONS
the branching degree has been reduced to . The grey
nodes are those that have first been removed and then appended
at the bottom level. Note, that none of the leaves of the right
subtree have been removed since their parent node already has
branching degree 3.
In the test runs that have been done for this paper, the
branching degree reduction operation is performed every
-th iteration of AH-PSO. Parameter is called
the decrease frequency. At every branching degree reduction
operation the branching degree is decreased by . This
parameter is called decrease step size. For the re-
duction procedure is applied consecutively (i.e., the branching
degree is always reduced in steps of 1) until the hierarchy
has the required branching degree. This is done until a certain
minimum branching degree is reached.
In order to decrease the branching degree of a node, it has to
be decided which of its direct subtrees has to be removed. In
preliminary experiments, we tested two strategies: removing
the subtree with the worst root node or the one with the best
root node. Since removing the best successor always provided
better results in these preliminary tests, we use only this
strategy for the experiments that are described in this paper.
A possible explanation why it is worse to remove the subtree
with the worst root could be that in this case the removed
particles that are appended to the bottom of the hierarchy have
only small chances to ascend in the hierarchy. Therefore, it
is likely that they are not immediately useful for the further
optimization process. The particles in the subtree with the
best root node on the other hand are holding good personal
best positions already and can thus immediately contribute
to the collective search process. is done to adjust the behavior of the swarm from exploration of
the entire search space to exploitation of promising regions. We
C. Specialization identify different search tasks by different inertia weights .
A common feature of self-organizing swarm behavior in na- Each level of the hierarchy is assigned a certain weight
ture is the specialization of certain individuals to certain tasks. and all particles in that level behave accordingly. The values
In such systems, the selection of which task an individual is are determined using either (3) or (4), with level
working on is usually modeled to be triggered by an external (the root is on level 0) and the resulting . The
stimulus. The stimulus is typically filtered by a threshold value algorithm using (3) has the values decreasing, from bottom
that depends on the current state of the individual. In [25], a Di- to top of the hierarchy, with the root particle using . This
vision-of-Labor-PSO algorithm has been presented that identi- algorithm is denoted . The algorithm using (4) in-
fies two different tasks for the individuals: the regular explo- verts this assignment with the root particle using and it is
ration task and a so called local search task. In the latter task, denoted
a particle is placed on the current global best position with a
new random velocity vector. The general idea is that if a par- (3)
ticle has not improved its fitness at an iteration, the threshold
for engaging in the local search task gradually decreases.
The H-PSO tree topology facilitates the assignment of dif- (4)
ferent tasks to the individuals. The particles are roughly ordered
by fitness into the different levels of the hierarchy. This does re- Different behaviors, controlled by different values of the in-
flect a current, relative state of the individuals and can thus be ertia weight for each particle, have also been used by other
used as a distinction for assigning different behaviors to the par- authors before. In [26], the PSO algorithm has been combined
ticles. Thus, instead of using a threshold based division of labor with the concept of self-organized criticality. Each particle holds
approach, we associate various tasks to the particles in different a current critical value that increases when the neighboring par-
levels of the hierarchy. ticles get closer. If enough criticality is accumulated, this criti-
It is a common modification of the basic PSO algorithm to cality is dispersed, thereby maybe exceeding the criticality limit
linearly decrease the value of parameter over time [17]. This of other particles, and a relocation scheme is initiated. Possible

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
1276 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER 2005

TABLE II As was described before, we appended the additional (with


PARAMETERS FOR TEST FUNCTIONS respect to the largest regular arrangement) nodes evenly below
the last level. For example, the regular H-PSO topology with
, has a resulting size of . For a swarm
of size the remaining 19 nodes are appended to the
16 nodes at the bottom level. Hence, each node receives one
successor and three nodes receive a second successor.
The H-PSO and AH-PSO algorithms have been compared
to the PSO algorithm using the gbest (PSO-g) and the lbest
(PSO-l) neighborhoods. The swarm size that has been used in
the experiments is . For H-PSO, the parameter values
and have been used. The branching degree of
AH-PSO started with and has been decreased until
. For the multimodal test functions, has been set to
the respective optimum branching degree for each test function
relocation schemes include resetting the personal best positions that was determined by comparing different values of . The
or pushing the particle forward in its current direction of move- branching degree is decreased every iterations by
ment. Another approach varied the inertia weight according steps. Several combinations of values for these parameters have
to the current criticality level of a particle. been tested, but only the results for the combination with the
best values are reported for each function. The tested values
were all combinations of decrease step size
IV. EXPERIMENTS and decrease frequency .
In this section, the experiments that have been done to com- In another experiment the number of iterations required to
pare the different variants of PSO for continuous function opti- reach a certain goal for each test function has been determined
mization are described. Since, in this paper, we are interested in comparing PSO-g, PSO-l, H-PSO, and .
understanding whether the proposed modifications of the stan- The parameters values that have been used in this experiment
dard PSO algorithm can improve its performance, we focus our are , , and . A swarm size of
experimental evaluation on the comparison with other PSO al- has been used in order to obtain a regular hierarchy for the
gorithms. It should be noted however that PSO is known to be a H-PSO variants and to stay close to the common swarm size
competitive method which often produces results that are com- of that has been used, for example, in [28]. For
parable or even better than those produced by other metaheuris- , the values of in the hierarchy varies from
tics (e.g., see [27]). In all our experiments, the PSO algorithms to . Vice versa for the
use the parameter values and parameter values vary from to .1 For
as recommended in [28], unless stated otherwise. Each run has PSO-g, PSO-l and H-PSO 2 parameter sets have been used.
been repeated 100 times and average results are presented. The All algorithms have been tested with two different parameter
particles have been initialized with a random position and a sets that were taken from the literature.
random velocity where in both cases the values in every dimen- One parameter set is and as suggested
sion have been randomly chosen according to a uniform distri- in [28] for a faster convergence rate. The other parameter set has
bution over the initial range [ ; ]. The values the common parameter values and .
and depend on the objective function. During a run of an Algorithms that use the first (second) set of parameter values are
algorithm, the position and velocity of a particle have not been denoted by appending “ a” (respectively “ b”) to the name,
restricted to the initialization intervals, but a maximum velocity e.g. PSO-g-a denotes PSO-g with the first parameter set.
has been used for every component of velocity
vector . A. Significance
The set of test functions (see Table I) contains functions In the experiments, the number of iterations required to reach
that are commonly used in the field of continuous function a specified goal for each function was recorded for each algo-
optimization. Table II shows the values that have been used for rithm. If the goal was not reached within the maximum number
the dimension of these functions, the range of the corresponding of 10 000 iterations, the run was considered unsuccessful. The
initial position and velocities of the particles, and the goals that success rate denotes the percentage of successful runs. For the
have to be achieved by the algorithms. The first two functions successful runs, the average, median, maximum, and minimum
(Sphere and Rosenbrock) are unimodal functions (i.e., they number of iterations required to achieve the goal value was cal-
have a single local optimum that is also the global optimum) culated. The expected number of iterations has been determined
and the remaining four functions are multimodal (i.e., they as (average/success rate). We also evaluated the significance of
have several local optima). All test runs have been run over observed differences between the algorithms.
10 000 iterations.
When comparing different hierarchies for H-PSO, the swarm
size has been kept constant for all different values of 1We also tried PSO-g, PSO-l and H-PSO with w = 0:4 which did
. The resulting hierarchies can then become irregular. not produce competitive results.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
JANSON AND MIDDENDORF: A HIERARCHICAL PARTICLE SWARM OPTIMIZER AND ITS ADAPTIVE VARIANT 1277

TABLE III
STEPS REQUIRED TO ACHIEVE A CERTAIN GOAL FOR PSO-g, PSO-l, H-PSO, H-PSO AND H-PSO. AVERAGE (AVG), MEDIAN (MED),
MAXIMUM (MAX), MINIMUM (MIN) AND EXPECTED (Exp := Avg/Succ) NUMBER OF ITERATIONS AND SIGNIFICANCE MATRIX; “ a” 0
0
AND “ b” DENOTE THE USED PARAMETER VALUES AS DESCRIBED IN SECTION IV

For evaluating the significance the Wilcoxon Rank Sum Test has samples but only the successful runs are considered, thus the
been used to compare the results for two algorithms. The results used sample sizes can differ. For two algorithms (X and Y) the
of the 100 test runs for two algorithms form two independent distribution of their results, and , are compared using

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
1278 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER 2005

the null-hypothesis : and the one-sided alterna-


tive : . We performed the tests at a significance
level of . In the results section the significance com-
parison among a set of algorithms is displayed using a
matrix , in which an entry “X” at position
denotes that algorithm is significantly better, i.e., it provides
smaller values than algorithm (where and are the position
in the corresponding result table). For example, the leftmost X
in the first row of Table III indicates that algorithm PSO-g-a is
significantly better than algorithm PSO-l-a. An entry “-” at po-
sition indicates that algorithm is not significantly better
than algorithm . The entries on the main diagonal are left
empty. We call the corresponding matrix a significance matrix
(s-matrix).

B. Diversity
Fig. 3. Average ranks for unimodal functions for H-PSO for different branch
The hierarchy of H-PSO divides the swarm into several sub- m
levels ( = 40).
swarms of particles that are located in different subtrees. This di-
vision is not persistent, as the particles can move between these
subswarms. We study in this paper whether the different sub-
trees in a H-PSO hierarchy can still specialize to different re-
gions of the search space as intended by the design of H-PSO.
Therefore, the diversity of the particles in the subtrees has been
measured. The corresponding test runs were done with param-
eter values , , and . The resulting topology
consists of five subtrees below the root node, each subtree con-
taining six nodes. The diversity has been measured for each of
these five subtrees. Since the size of a particle set influences the
diversity values, we can not directly compare the diversity of the
particles within a subtree with the diversity of the whole swarm.
Therefore, the diversity within a subtree is also compared to the
diversity of a subset of six particles that have been selected ran-
domly with uniform probability from the whole swarm.
Determining the diversity within the swarm is based on
the “distance-to-average-point” measure given by Fig. 4. Average ranks for multimodal functions for H-PSO for different branch
(5), where is a subset of the swarm and the size of . m
levels ( = 40).
The problem dimension is denoted by and is the -th
component of . The average point of the particles in at As diversity measure of a subset of the swarm at
iteration is given as iteration measure is used.
Scaling by preserves the relations between dif-
ferent subsets, since all measures are scaled by the same value.
(5)
V. RESULTS

The “distance-to-average-point” measure returns an un- A. Branch Degree


bounded, absolute value that depends on the considered test The results of the experiment in which the influence of the
function and the current diameter of the swarm. For compar- branch degree of the H-PSO hierarchy on the optimization
ison we want to use a measure that is independent of the test behavior has been investigated are shown in Fig. 3 (for the
function and the state of the optimization process. Therefore, unimodal test functions) and in Fig. 4 (for the multimodal
is scaled by the diameter of the swarm. The diameter test functions). The branch degree varies over all values
is determined as the maximum distance between any two . For every test function the solution qualities that
particles in the entire swarm (not just particles from subset were achieved by the algorithms using the different parameter
)—see (6) settings have been ranked at each iteration and for each value
of the average rank (over the unimodal re-
spectively multimodal test functions) is shown. In the figures
(6) the ranks for the different branch degrees are displayed for
iterations .

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
JANSON AND MIDDENDORF: A HIERARCHICAL PARTICLE SWARM OPTIMIZER AND ITS ADAPTIVE VARIANT 1279

Fig. 5. Sphere—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = 5) Fig. 7. Rastrigin—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = 5)
and AH-PSO; swarm size m = 40. and AH-PSO; swarm size m = 40.

Fig. 6. Rosenbrock—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = Fig. 8. Griewank—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = 5)
5) and AH-PSO; swarm size m = 40. and AH-PSO; swarm size m = 40.

The results for the two unimodal test functions show that
higher branch degrees lead to a better optimization behavior
over the entire duration of the test runs. Only for branching de-
grees higher than , the average ranks do not differ much
because for the hierarchies become very similar. For the
multimodal test functions, the results show a clearly different
behavior. In the beginning of the optimization process
again the H-PSO algorithms with higher branching degrees
achieved better ranks, but this tendency is inverted for later itera-
tions in which the smaller degrees are
better. This shows that a flexible topology in which the branch
degree changes during the optimization process might lead to
better results for multimodal test functions.

B. Topology
In this subsection, the H-PSO and AH-PSO algorithm (with Fig. 9. Schaffer—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = 5)
parameter values , ) are compared to PSO using the and AH-PSO; swarm size m = 40.
two neighborhood topologies gbest (PSO-g) and lbest (PSO-l).
The swarm size that has been used is . As shown in and Rosenbrock (Fig. 6) PSO-g performes better than PSO-l.
the last subsection, the optimization success of a neighborhood On the other hand, PSO-l achieves better results than PSO-g
topology (gbest, lbest) highly depends on the kind of function for Rastrigin (Fig. 7), Griewank (Fig. 8), Schaffer (Fig. 9) and
to be optimized (unimodal or multimodal). For Sphere (Fig. 5) Ackley (Fig. 10).

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
1280 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER 2005

C. Specialization
In this section, H-PSO and its weighted variants are com-
pared with PSO-l and PSO-g. The comparison is based on the
number of iterations required to reach a certain goal for each of
the test functions. The algorithms all used swarm size .
In Table III, the average, median, maximum and minimum it-
erations required to achieve the goal value are shown. Also, the
success rate and the expected number of iterations (average/suc-
cess rate) to reach the goal are given.
is always among the fastest algorithms to achieve
the desired goal. Only for the Rastrigin function, PSO-g-a does
require an average of 104.0 iterations, where takes
184.4. For all other test functions, obtains the lowest
average and expected number of iterations required to reach the
goal.
Fig. 10. Ackley—solution quality for PSO-g, PSO-l, H-PSO (h = 3, d = 5) The success rate of and is in general
and AH-PSO; swarm size m = 40. very high. An exception is the Ackley function for which
and (with ) achieve success rates of
The results show that the optimization behavior of H-PSO only 0.01 and 0.13, respectively. Also, PSO-g has a very low
does not depend so much on the specific function type. H-PSO success rate for this function in contrast to PSO-l which has a
produces competitive results for all the considered functions, high success rate. Therefore, and have
regardless of whether they are unimodal or multimodal. For all also been tested for a small branching degree of (these
test functions, H-PSO performs clearly better than the worst values are included in the table). For the small branching degree
one of PSO-l and PSO-g. On some test functions (Rastrigin and has a very high success rate and finds a solution of
Schaffer), it performs even better than both PSO versions. It the required quality on average in less iterations than the PSO-l
should be noted that these results have been obtained with the algorithms.
same fixed branch degree . Hence, the value can be The significance comparison shows that only for the Rast-
used as a recommendation for parameter choice. Nevertheless, rigin function PSO-g-a is significantly better than .
it can be expected that the results of H-PSO for a specific test performs better for the Sphere, Rosenbrock and
function can in general be improved by adapting the value Griewank function and also for the Ackley function, considering
of . that PSO-g-a only reached the goal once in all 100 test runs.
Using the adaptive H-PSO (AH-PSO) algorithm the results of The results show that can reach a required goal
H-PSO could be improved. For every test function the minimum significantly faster than PSO and H-PSO. This demonstrates that
branch degree of AH-PSO is set to the optimal branch H-PSO is a very promising algorithm. The results also indicate
degree of the respective function (determined by the experi- that a heterogenous swarm of particles with different values for
ments in Section V-A). The AH-PSO parameters are displayed seems advantageous.
as AH-PSO( , , ) for the minimum branch de- In order to explain the different optimization behavior of
gree , the decrease frequency and the decrease step and , recall that the parameter values
size . of are ranging from 0.4 to 0.729. Since the majority of the
For the unimodal test functions—Rosenbrock and swarm is located in lower levels of the hierarchy, most of the
Sphere—the AH-PSO algorithm that decreased the branch individuals of use . The average
degree only every 1000 iterations by 1 performs best. The for , with , and , is 0.692,
branch degree is only reduced to until the end of the compared to 0.437 for . This smaller impact of
algorithm. AH-PSO(2,1000,1) performs better than the best the previous velocity compared to the personal best and
performing H-PSO for the Rosenbrock function neighborhood best attractors in could explain why
and very similar to the best H-PSO for the Sphere converges faster than .
function. For all multimodal test problems, the best performing Another possible explanation can be based on the investiga-
AH-PSO algorithms reduce the branch degree to the optimum tions of the convergence behavior of a deterministic version of
degree very fast. In case the Rastrigin function AH-PSO(4,5,4) PSO. In [29] it was shown for the deterministic PSO that the
reaches the optimum degree after 20 iterations and it update equation is stable only if . Thus,
performs better than the best H-PSO . Also, for the for and , the particle tra-
other multimodal test functions, AH-PSO performs better than jectory would diverge (respectively converge). Hence, must
H-PSO with the optimum branch degree. It can be concluded be sufficiently large for a given value of and or the trajec-
that a fast adaption of the hierarchy during the initial phase tory is not stable. Assuming that the results of the deterministic
of a run seems to be advantageous for multimodal functions, PSO can be transferred, the particle on top of would
whereas optimization of unimodal functions can benefit from a not have a smooth trajectory but would jump around more ran-
slow decrease of over the entire running time. domly compared to the top particle of . But care

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
JANSON AND MIDDENDORF: A HIERARCHICAL PARTICLE SWARM OPTIMIZER AND ITS ADAPTIVE VARIANT 1281

Fig. 12. Diversity D(S; t) for H-PSO with m = 31, h = 3, d = 5 of the


Fig. 11. Ratio of the number of swaps between level 0 and 1 to the number of five subtrees (grey) and random subset of 13 nodes (black) for the unimodal
0
swaps between level 1 and 2 for H PSO and H PSO. 0 functions Sphere, Rosenbrock (top to bottom).

has to be taken with such an explanation since the stochastic


PSO variants use and with an expected value of
, for and , randomly
chosen from [ 0;1 ]. For the deterministic
PSO would converge. Our observations have shown that for the
test functions a more likely explanation is that the top particle
in tends to converge too early to the actual point of
attraction and is then replaced by a better individual (this might
also explain the higher number of swaps for ).
During a run of H-PSO, the individuals at different levels of
the hierarchy are swapped according to their current fitness. In
, the root particle uses and is thus slowed
down. In , on the other hand, the individuals on
the upper levels move faster. In order to investigate the conse-
quences, the number of swaps that occur during a run have been
measured. In most of the swaps take place at the
top of the hierarchy. In the test runs the ratio of swaps that occur
between levels 0 and 1 to swaps that occur between levels 1 and
2 is approximately 0.8 to 1, depending on the objective function.
The same ratio for is much lower and only approx- Fig. 13. Diversity D (S; t) for H-PSO with m = 31, h = 3, d = 5 of the
five subtrees (grey) and random subset of 13 nodes (black) for the multimodal
imately 0.2 to 0.4. The ratio of swaps between level 0 and 1 to functions Rastrigin, Griewank, Schaffer, Ackley (top to bottom).
swaps between 1 and 2 is displayed in Fig. 11 for
and .
their search effort to different areas of the search space—in mul-
timodal environments—as intended by the design of H-PSO.
D. Diversity The number of swaps that occurred anywhere in the hierarchy
over the entire run of 10 000 iterations has also been measured.
The results about the diversity within the different subswarms This number is significantly larger for the unimodal test func-
of H-PSO are presented in this subsection. The diversity mea- tions ( 8000) and the total number of swaps up to a certain
sure introduced in Section IV-B is taken for H-PSO with param- iteration increases almost linearly with the number of iterations.
eter values , and . The diversity that was For the multimodal functions, the observed total number of
measured for the different test functions within the five subtrees swaps is less than 2200. Except for the Rastrigin function, the
of the H-PSO and within a randomly selected subset is displayed main fraction of all swaps is done very early during a run [99%
in Fig. 12 and Fig. 13. of all swaps are been done until iteration 1680 (Griewank),
For the unimodal test functions (Fig. 12), the diversity within 2470 (Schaffer), and 2240 (Ackley)]. Roughly speaking, at
the randomly selected subset is almost identical to the diversity these iterations the behavior of the diversity curves change (see
within the subtrees. The diversity values are especially similar Fig. 13). For the Griewank and Ackley function, this behavior
for the Rosenbrock function. Since there is only one local op- looks very similar, the diversity distance between subtrees
timum for these functions, the subswarms of H-PSO algorithm and random subset increases, because the subtrees concentrate
do not concentrate on different regions of the search space. This on different parts of the search space after an initial phase.
is clearly different for the multimodal functions in which the di- Only very few particles move between subtrees then. For the
versity among the particles in each subtree is distinctly smaller Rastrigin and Schaffer function, the subtrees concentrate the
than for particles in a random subswarm of the same size. This search on different regions of the search space nearly right from
demonstrates that the particles of different subtrees concentrate the start.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
1282 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 35, NO. 6, DECEMBER 2005

VI. CONCLUSION [15] P. N. Suganthan, “Particle swarm optimizer with neighborhood oper-
ator,” in Proc. Congr. Evolutionary Computation (CEC 1999), 1999, pp.
In this paper we have introduced a hierarchical version of 1958–1962.
PSO, called H-PSO, in which the particles are arranged in a [16] J. Kennedy, “Stereotyping: improving particle swarm performance with
cluster analysis,” in Proc. Congr. Evolutionary Computation (CEC
dynamic hierarchy. This hierarchy gives the particles different 2000), 2000, pp. 1507–1512.
influence on the rest of the swarm with respect to their current [17] Y. Shi and R. C. Eberhart, “Parameter selection in particle swarm opti-
mization,” in Proc. Evolutionary Programming VII, vol. 1447, 1998, pp.
fitness. Hierarchies with different tree topologies have been 591–600.
studied for a standard set of test functions (Sphere, Rosenbrock, [18] T. Krink, J. S. Vesterstrøm, and J. Riget, “Particle swarm optimization
Rastrigin, Griewank, Schaffer’s f6, Ackley), and have been with spatial particle extension,” in Proc. IEEE Congr. Evolutionary
Computation (CEC 2002), 2002, pp. 1474–1479.
compared to several variants of PSO. The H-PSO algorithm [19] T. M. Blackwell, “Swarms in dynamic environments,” in Proc. Genetic
performed good on all of the considered test functions regard- and Evolutionary Computation (GECCO 2003), vol. 2723, 2003, pp.
less of their type (unimodal or multimodal). 1–12.
[20] J. Riget and J. S. Vesterstrøm, “A Diversity-Guided Particle Swarm Op-
Moreover, a variant of H-PSO (AH-PSO) with a dynamically timizer—The ARPSO,” Dept. Comput. Sci., Univ. of Aarhus, Aarhus,
changing branching degree of the tree topology has been intro- Denmark, Tech. Rep. 2002-02, 2002.
[21] A. Silva, A. Neves, and E. Costa, “An empirical comparison of particle
duced which could improve the performance of H-PSO. An- swarm and predator prey optimization,” in Proc. 13th Irish Int. Conf. Ar-
other extension of H-PSO is to use different values for the inertia tificial Intelligence and Cognitive Science, vol. 2464, 2002, pp. 103–110.
weight of the particles according to their level in the hier- [22] J. Kennedy, “Small worlds and mega-minds: effects of neighborhood
topology on particle swarm performance,” in Proc. Congr. Evolutionary
archy. It has been shown that this algorithm is able to reach a Computation (CEC 1999), vol. 3, 1999, pp. 1931–1938.
specified goal for every test function (except the Rastrigin func- [23] K. Veeramachaneni, T. Peram, C. Mohan, and L. A. Osadciw, “Op-
tion) faster than all other variants of PSO. timization using particle swarms with near neighbor interactions,” in
Proc. Genetic and Evolutionary Computation (GECCO 2003), vol.
In order to better understand the observed optimization be- 2723, 2003, pp. 110–121.
havior of H-PSO, the diversity of different subswarms of H-PSO [24] J. Kennedy and R. Mendes, “Neighborhood topologies in fully-informed
and best-of-neighborhood particle swarms,” in Proc. IEEE Int. Work-
and the number of swaps that occur within the hierarchy have shop on Soft Computing in Industrial Applications, 2003, pp. 45–50.
been examined. These observations indicate that, in spite of the [25] J. S. Vesterstrøm, J. Riget, and T. Krink, “Division of labor in particle
dynamic nature of the hierarchy, the subtrees are able to spe- swarm optimization,” in Proc. Congr. Evolutionary Computation (CEC
2002), 2002, pp. 1570–1575.
cialize to certain regions of the search space in case of multi- [26] M. Løvbjerg and T. Krink, “Extending particle swarm optimizers with
modal functions. self-organized criticality,” in Proc. Congr. Evolutionary Computation
One interesting topic for future research is to identify condi- (CEC 2002), 2002, pp. 1588–1593.
[27] J. Kennedy and W. M. Spears, “Matching algorithms to problems: an ex-
tions that could trigger the decrease of the branching degree for perimental test of the particle swarm and some genetic algorithms on the
AH-PSO so that it could autonomously adapt to the state of the multimodal problem generator,” in Proc. Int. Conf. Evolutionary Com-
putation, 1998, pp. 78–83.
optimization process. [28] I. C. Trelea, “The particle swarm optimization algorithm: convergence
analysis and parameter selection,” IPL: Inform. Process. Lett., vol. 85,
REFERENCES 2003.
[29] F. van den Bergh, “An Analysis of Particle Swarm Optimizers,” Ph.D.
[1] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. dissertation, Dept. Comput. Sci., Univ. Pretoria, Pretoria, South Africa,
IEEE Int. Conf. Neural Networks, vol. 4, 1995, pp. 1942–1947. 2002.
[2] Handbook of Metaheuristics, F. Glover and G. Kochenberger, Eds.,
Kluwer , Norwell, MA, 2002.
[3] V. Cerný, “A thermodynamical approach to the traveling salesman
problem,” J. Optim. Theory Applicat., vol. 45, no. 1, pp. 41–51, 1985. Stefan Janson received the diploma in computer
[4] S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, “Optimization by sim- science from the University of Karlsruhe, Karlsruhe,
ulated annealing,” Science, vol. 220, pp. 671–680, 1983. Germany, in 2002.
[5] J. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, He is currently working on the German Research
MI: Univ. of Michigan Press, 1975. Council (DFG) project “Methods of Swarm Intelli-
[6] L. J. Fogel, A. J. Owens, and M. J. Walsh, Artificial Intelligence Through gence on Reconfigurable Architectures” at the Uni-
Simulated Evolution. New York: Wiley, 1966. versity of Leipzig, Leipzig, Germany.
[7] I. Rechenberg, Evolutionsstrategie—Optimierung technischer Systeme
nach Prinzipien der biologischen Information. Freiburg, Germany:
Fromman Verlag, 1973.
[8] H.-P. Schwefel, Numerical Optimization of Computer Models. Chich-
ester, U.K.: Wiley, 1981.
[9] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by
a colony of cooperating agents,” IEEE Trans. Syst., Man, Cybern. B, vol.
Martin Middendorf (M’98) received the diploma
26, no. 1, pp. 29–41, Feb. 1996.
[10] M. Dorigo, G. Di Caro, and L. M. Gambardella, “Ant algorithms for degree in mathematics and the Dr.rer.nat. degree
discrete optimization,” Artif. Life, vol. 5, no. 2, pp. 137–172, 1999. from the University of Hannover, Hannover, Ger-
[11] M. Dorigo and T. Stützle, Ant Colony Optimization. Cambridge, MA: many, in 1988 and 1992, respectively. He also
MIT Press, 2004. received the Habilitation degree in 1998 from the
[12] M. Guntsch and M. Middendorf, “A population based approach for University of Karlsruhe, Karlsruhe, Germany.
ACO,” in Proc. 2nd Eur. Workshop on Evolutionary Computation in He has previously been with the University of
Combinatorial Optimization (EvoCOP-2002), vol. 2279, 2002, pp. Dortmund, Dortmund, Germany, and the University
72–81. of Hannover as a Visiting Professor of Computer
[13] J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle Science. He was a Professor of Computer Science
swarm optimization,” in Proc. Conf. Systems, Man, and Cybernetics, at the Catholic University of Eichsttt, Eichsttt,
1997, pp. 4104–4109. Germany. Currently he is a Professor for Parallel Computing and Complex
[14] J. Kennedy and R. Mendes, “Population structure and particle swarm Systems with the University of Leipzig, Leipzig, Germany. His research
performance,” in Proc. Congr. Evolutionary Computation (CEC 2002), interests include reconfigurable architectures, parallel algorithms, algorithms
2002, pp. 1671–1676. from nature, and bioinformatics.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 10,2023 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.

You might also like