Notes EA
Notes EA
html
1 Introduction
Fig. 1-1: Problem solution using evolutionary algorithms
Different main schools of evolutionary algorithms have evolved during the last 40 years:
genetic algorithms, mainly developed in the USA by J. H. Holland [Hol75], evolutionary
strategies, developed in Germany by I. Rechenberg [Rec73] and H.-P. Schwefel [Sch81]
and evolutionary programming [FOW66]. Each of these constitutes a different approach,
however, they are inspired by the same principles of natural evolution. A good introductory
survey can be found in [Fdb94a].
Chapter 10 explains how complete optimization algorithms can be created from the
different evolutionary operators. The respective options are discussed in detail. Each of the
presented optimization algorithms represents an evolutionary algorithm.
Chapter 11 lists all the used references and a large number of other publications from the
field of Evolutionary Algorithms.
GEATbx: Genetic and Evolutionary Algorithm Toolbox for use with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
1/2
2/2
geatbx.com/docu/algindex-01.html
2 Overview
Evolutionary algorithms are stochastic search methods that mimic the metaphor of natural
biological evolution. Evolutionary algorithms operate on a population of potential solutions
applying the principle of survival of the fittest to produce better and better approximations to
a solution. At each generation, a new set of approximations is created by the process of
selecting individuals according to their level of fitness in the problem domain an d breeding
them together using operators borrowed from natural genetics. This process leads to the
evolution of populations of individuals that are better suited to their environment than the
individuals that they were created from, just as in natural adaptation.
At the beginning of the computation a number of individuals (the population) are randomly
initialized. The objective function is then evaluated for these individuals. The first/initial
generation is produced.
If the optimization criteria are not met the creation of a new generation starts. Individuals
are selected according to their fitness for the production of offspring. Parents are
recombined to produce offspring. All offspring will be mutated with a certain probability. The
fitness of the offspring is then computed. The offspring are inserted into the population
replacing the parents, producing a new generation. This cycle is performed until the
optimization criteria are reached.
1/5
Such a single population evolutionary algorithm is powerful and performs well on a wide
variety of problems. However, better results can be obtained by introducing multiple
subpopulations. Every subpopulation evolves over a few generations isolated (like the
single population evolutionary algorithm) before one or more individuals are exchanged
between the subpopulation. The multi-population evolutionary algorithm models the
evolution of a species in a way more similar to nature than the single population
evolutionary algorithm. Figure shows the structure of such an extended multi-population
evolutionary algorithm.
From the above discussion, it can be seen that evolutionary algorithms differ substantially
from more traditional search and optimization methods. The most significant differences
are:
2/5
The following sections list some methods and operators of the main parts of Evolutionary
Algorithms. A thorough explanation of the operators will be given in the following chapters.
2.1 Selection
Selection determines, which individuals are chosen for mating (recombination) and how
many offspring each selected individual produces. The first step is fitness assignment by:
The actual selection is performed in the next step. Parents are selected according to their
fitness by means of one of the following algorithms:
2.2 Recombination
All presentation:
discrete recombination, see Subsection 4.1, (known from recombination of real
valued variables), corresponds with uniform crossover, see Subsection 4.3.2
(known from recombination of binary valued variables),
Real valued recombination, see Section 4.2:
intermediate recombination, see Subsection 4.2.1,
line recombination, see Subsection 4.2.2,
extended line recombination, see Subsection 4.2.3.
Binary valued recombination, see Section 4.3:
single-point / double-point /multi-point crossover, see Subsection 4.3.1,
uniform crossover, see Subsection 4.3.2,
shuffle crossover, see Subsection 4.3.3,
crossover with reduced surrogate, see Subsection 4.3.4.
For the recombination of binary valued variables the name 'crossover' is established. This
has mainly historical reasons. Genetic algorithms mostly used binary variables and the
name 'crossover'. Both notions (recombination and crossover) are equivalent in the area of
3/5
Evolutionary Algorithms. For consistency, throughout this study the notion 'recombination'
will be used (except when referring to specially named methods or operators).
2.3 Mutation
After recombination every offspring undergoes mutation. Offspring variables are mutated
by small perturbations (size of the mutation step), with low probability. The representation
of the variables determines the used algorithm. Two operators are explained:
2.4 Reinsertion
After producing offspring they must be inserted into the population. This is especially
important, if less offspring are produced than the size of the original population. Another
case is, when not all offspring are to be used at each generation or if more offspring are
generated than needed. By a reinsertion scheme is determined which individuals should be
inserted into the new population and which individuals of the population will be replaced by
offspring.
Based on the regional population model the application of multiple different strategies at the
same time is possible. This is done by applying different operators and parameters for each
subpopulation. For an efficient distribution of resources during an optimization competing
subpopulations are used.
This document is part of version 3.8 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use
with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
5/5
geatbx.com/docu/algindex-02.html
3 Selection
In selection the offspring producing individuals are chosen. The first step is fitness
assignment. Each individual in the selection pool receives a reproduction probability
depending on the own objective value and the objective value of all other individuals in the
selection pool. This fitness is used for the actual selection step afterwards.
Throughout this section some terms are used for comparing the different selection
schemes. The definition of these terms follow [Bak87] and [BT95].
selective pressure
probability of the best individual being selected compared to the average probability
of selection of all individuals
bias:
spread
loss of diversity
selection intensity
expected average fitness value of the population after applying a selection method to
the normalized Gaussian distribution
selection variance
1/13
In rank-based fitness assignment, the population is sorted according to the objective
values. The fitness assigned to each individual depends only on its position in the
individuals rank and not on the actual objective value.
Rank-based fitness assignment overcomes the scaling problems of the proportional fitness
assignment. (Stagnation in the case where the selective pressure is too small or premature
convergence where selection has caused the search to narrow down too quickly.) The
reproductive range is limited, so that no individuals generate an excessive number of
offspring. Ranking introduces a uniform scaling across the population and provides a simple
and effective way of controlling selective pressure.
Rank-based fitness assignment behaves in a more robust manner than proportional fitness
assignment and, thus, is the method of choice. [BH91], [Why89]
Linear ranking:
(3-1)
Non-linear ranking:
(3-2)
(3-3)
2/13
The probability of each individual being selected for mating depends on its fitness
normalized by the total fitness of the population.
Table contains the fitness values of the individuals for various values of the selective
pressure assuming a population of 11 individuals and a minimization problem.
3/13
95 0.0 0.90 1,0 0.14 0.38
Selection intensity
(3-4)
Loss of diversity
(3-5)
Selection variance
(3-6)
3.2 Multi-
objective
Ranking
4/13
established from these reciprocal comparisons - multi-objective ranking. After this order has
been established the single-objective ranking methods from the subsection 3.1 can be used
to convert the order of the individuals to corresponding fitness values.
The simplest selection scheme is roulette-wheel selection, also called stochastic sampling
with replacement [Bak87]. This is a stochastic algorithm and involves the following
technique:
The individuals are mapped to contiguous segments of a line, such that each individual's
segment is equal in size to its fitness. A random number is generated and the individual
whose segment spans the random number is selected. The process is repeated until the
desired number of individuals is obtained (called mating population). This technique is
analogous to a roulette wheel with each slice proportional in size to the fitness, see figure .
Table shows the selection probability for 11 individuals, linear ranking and selective
pressure of 2 together with the fitness value. Individual 1 is the most fit individual and
occupies the largest interval, whereas individual 10 as the second least fit individual has
the smallest interval on the line (see figure ). Individual 11, the least fit interval, has a fitness
value of 0 and get no chance for reproduction
Number of individual 1 2 3 4 5 6 7 8 9 10 11
fitness value 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0
selection probability 0.18 0.16 0.15 0.13 0.11 0.09 0.07 0.06 0.03 0.02 0.0
For selecting the mating population the appropriate number of uniformly distributed random
numbers (uniform distributed between 0.0 and 1.0) is independently generated.
Figure shows the selection process of the individuals for the example in table together with
the above sample trials.
5/13
After selection the mating population consists of the individuals:
1, 2, 3, 5, 6, 9.
The roulette-wheel selection algorithm provides a zero bias but does not guarantee
minimum spread.
Stochastic universal sampling [Bak87] provides zero bias and minimum spread. The
individuals are mapped to contiguous segments of a line, such that each individual's
segment is equal in size to its fitness exactly as in roulette-wheel selection. Here equally
spaced pointers are placed over the line as many as there are individuals to be selected.
Consider NPointer the number of individuals to be selected, then the distance between the
pointers are 1/NPointer and the position of the first pointer is given by a randomly
generated number in the range [0, 1/NPointer].
For 6 individuals to be selected, the distance between the pointers is 1/6=0.167. Figure
shows the selection for the above example.
0.1.
1, 2, 3, 4, 6, 8.
In local selection every individual resides inside a constrained environment called the local
6/13
neighborhood. (In the other selection methods the whole population or subpopulation is the
selection pool or neighborhood.) Individuals interact only with individuals inside this region.
The neighborhood is defined by the structure in which the population is distributed. The
neighborhood can be seen as the group of potential mating partners.
Local selection is part of the local population model, see Section 8.2.
The first step is the selection of the first half of the mating population uniform at random (or
using one of the other mentioned selection algorithms, for example, stochastic universal
sampling or truncation selection). Now a local neighborhood is defined for every selected
individual. Inside this neighborhood the mating partner is selected (best, fitness
proportional, or uniform at random).
linear
full ring, half ring (see Figure )
two-dimensional
full cross, half cross (see Figure , left)
full star, half star (see Figure , right)
three-dimensional and more complex with any combination of the above structures.
The distance between possible neighbors together with the structure determines the size of
the neighborhood. Table gives examples for the size of the neighborhood for the given
structures and different distance values.
Fig. 3-6: Two-dimensional neighborhood; left: full and half cross, right: full and half star
7/13
Between individuals of a population an
`isolation by distance' exists. The smaller
the neighborhood, the bigger the isolation
distance. However, because of overlapping
neighborhoods, propagation of new
variants takes place. This assures the
exchange of information between all
individuals.
distance
structure of selection 1 2
full ring 2 4
half ring 1 2
full star 8 24
half star 3 8
8/13
3.6 Truncation selection
In truncation selection individuals are sorted according to their fitness. Only the best
individuals are selected for parents. These selected parents produce uniform at random
offspring. The parameter for truncation selection is the truncation threshold Trunc. Trunc
indicates the proportion of the population to be selected as parents and takes values
ranging from 50%-10%. Individuals below the truncation threshold do not produce offspring.
The term selection intensity is often used in truncation selection. Table shows the relation
between both.
Selection intensity
(3-7)
Loss of diversity
(3-8)
Selection variance
(3-9)
9/13
3.7 Tournament selection
In tournament selection [GD91] a number Tour of individuals is chosen randomly from the
population and the best individual from this group is selected as parent. This process is
repeated as often as individuals must be chosen. These selected parents produce uniform
at random offspring. The parameter for tournament selection is the tournament size Tour.
Tour takes values ranging from 2 to Nind (number of individuals in population). Table and
figure show the relation between tournament size and selection intensity [BT95].
tournament size 1 2 3 5 10 30
Selection intensity
(3-10)
Loss of diversity
(3-11)
(About 50% of the population are
lost at tournament size Tour=5).
Selection variance
10/13
(3-12)
As shown in the previous sections of this chapter the selection methods behave similarly
assuming similar selection intensity.
However, the behavior of the selection methods is different. Thus, the selection methods
will be compared on the parameters loss of diversity (figure ) and selections variance
(figure ) on the selection intensity.
11/13
3.8.2 Loss of diversity and selection intensity
Fig. 3-10: Dependence of loss of diversity on selection intensity
Truncation selection leads to a much higher loss of diversity for the same selection
intensity compared to ranking and tournament selection. Truncation selection is more likely
to replace less fit individuals with fitter offspring, because all individuals below a certain
fitness threshold do not have a probability to be selected. Ranking and tournament
selection seem to behave similarly. However, ranking selection works in an area where
tournament selection does not work because of the discrete character of tournament
selection.
12/13
For the same selection intensity truncation selection leads to a much smaller selection
variance than ranking or tournament selection. As can be seen clearly ranking selection
behaves similar to tournament selection. However, again ranking selection works in an area
where tournament selection does not work because of the discrete character of tournament
selection. In [BT95] was proven that the fitness distribution for ranking and tournament
selection for SP=2 and Tour=2 (SelInt=1/sqrt(pi)) is identical.
This document is part of version 3.8 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use
with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
13/13
geatbx.com/docu/algindex-03.html
4 Recombination
Recombination produces new individuals in combining the information contained in two or
more parents (parents - mating population). This is done by combining the variable values
of the parents. Depending on the representation of the variables different methods must be
used.
Section 4.1 describes the discrete recombination. This method can be applied to all
variable representations. Section 4.2 explains methods for real valued variables. Methods
for binary valued variables are described in Section 4.3.
The methods for binary valued variables constitute special cases of the discrete
recombination. These methods can all be applied to integer valued and real valued
variables as well.
(4-1)
Discrete recombination generates corners of the hypercube defined by the parents. Figure
shows the geometric effect of discrete recombination.
1/9
Consider the following two individuals with 3 variables each (3 dimensions), which will also
be used to illustrate the other types of recombination for real valued variables:
individual 1 12 25 5
individual 2 123 4 34
For each variable the parent who contributes its variable to the offspring is chosen
randomly with equal probability.:
sample 1 2 2 1
sample 2 1 2 1
offspring 1 123 4 5
offspring 2 12 4 5
Discrete recombination can be used with any kind of variables (binary, integer, real or
symbols).
The recombination methods in this section can be applied for the recombination of
individuals with real valued variables.
(4-2)
where a is a scaling factor chosen uniformly at random over an interval [-d, 1+d] for each
variable anew.
2/9
The value of the parameter d defines the size of the area for possible offspring. A value of
d = 0 defines the area for offspring the same size as the area spanned by the parents. This
method is called (standard) intermediate recombination. Because most variables of the
offspring are not generated on the border of the possible area, the area for the variables
shrinks over the generations. This shrinkage occurs just by using (standard) intermediate
recombination. This effect can be prevented by using a larger value for d. A value of
d = 0.25 ensures (statistically), that the variable area of the offspring is the same as the
variable area spanned by the variables of the parents. See figure 4-2 for a picture of the
area of the variable range of the offspring defined by the variables of the parents.
Fig. 4-2: Area for variable value of offspring compared to parents in intermediate
recombination
individual 1 12 25 5
individual 2 123 4 34
(4-3)
For the value of d the statements given for intermediate recombination are applicable.
individual 1 12 25 5
individual 2 123 4 34
sample 1 0.5
sample 2 0.1
Line recombination can generate any point on the line defined by the parents. Figure 4-4
shows the possible positions of the offspring after line recombination.
Inside this possible area the offspring are not uniform at random distributed. The probability
of creating offspring near the parents is high. Only with low probability offspring are created
far away from the parents. If the fitness of the parents is available, then offspring are more
4/9
often created in the direction from the worse to the better parent (directed extended line
recombination).
(4-4)
The creation of offspring uses features similar to the mutation operator for real valued
variables (see Section 5.1). The parameter a defines the relative step-size, the parameter r
the maximum step-size and the parameter s the direction of the recombination step.
The parameter k determines the precision used for the creation of recombination steps. A
larger k produces more smaller steps. For all values of k the maximum value for a is a = 1
(u = 0). The minimum value of a depends on k and is a =2-k (u = 1). Typical values for the
precision parameter k are in the area from 4 to 20.
Fig. 4-5: Possible positions of the offspring after extended line recombination according to
the positions of the parents and the definition area of the variables
A robust value for the parameter r (range of recombination step) is 10% of the domain of
the variable. However, according to the defined domain of the variables or for special cases
this parameter can be adjusted. By selecting a smaller value for r the creation of offspring
may be constrained to a smaller area around the parents.
Extended line recombination is only applicable to real variables (and not binary or integer
variables).
This section describes recombination methods for individuals with binary variables.
Commonly, these methods are called 'crossover'. Thus, the notion 'crossover' will be used
to name the methods.
During the recombination of binary variables only parts of the individuals are exchanged
between the individuals. Depending on the number of parts, the individuals are divided
before the exchange of variables (the number of cross points). The number of cross points
distinguish the methods.
individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1
crossover position 5
offspring 1 0 1 1 1 0| 1 0 0 1 0 1
offspring 2 1 0 1 0 1| 0 1 1 0 1 0
6/9
In double-point crossover two crossover positions are selected uniformly at random and the
variables exchanged between the individuals between these points. Then two new offspring
are produced.
Single-point and double-point crossover are special cases of the general method multi-point
crossover.
individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1
offspring 1 0 1| 1 0 1 1| 0 1 1 1| 1
offspring 2 1 0| 1 1 0 0| 0 0 1 0| 0
The idea behind multi-point, and indeed many of the variations on the crossover operator,
is that parts of the chromosome representation that contribute most to the performance of a
particular individual may not necessarily be contained in adjacent substrings [Boo87].
Further, the disruptive nature of multi-point crossover appears to encourage the exploration
of the search space, rather than favouring the convergence to highly fit individuals early in
the search, thus making the search more robust [SDJ91b].
7/9
structure is created at random and the parity of the bits in the mask indicate which parent
will supply the offspring with which bits. This method is identical to discrete recombination,
see Section 4.1.
individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1
For each variable the parent who contributes its variable to the offspring is chosen
randomly with equal probability. Here, the offspring 1 is produced by taking the bit from
parent 1 if the corresponding mask bit is 1 or the bit from parent 2 if the corresponding
mask bit is 0. Offspring 2 is created using the inverse of the mask, usually.
sample 1 0 1 1 0 0 0 1 1 0 1 0
sample 2 1 0 0 1 1 1 0 0 1 0 1
offspring 1 1 1 1 0 1 1 1 1 1 1 1
offspring 2 0 0 1 1 0 0 0 0 0 0 0
Uniform crossover, like multi-point crossover, has been claimed to reduce the bias
associated with the length of the binary representation used and the particular coding for a
given parameter set. This helps to overcome the bias in single-point crossover towards
short substrings without requiring precise understanding of the significance of the individual
bits in the individuals representation. [SDJ91a] demonstrated how uniform crossover may
be parameterized by applying a probability to the swapping of bits. This extra parameter
can be used to control the amount of disruption during recombination without introducing a
bias towards the length of the representation used.
This document is part of version 3.8 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use
with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
8/9
9/9
geatbx.com/docu/algindex-04.html
5 Mutation
By mutation individuals are randomly altered. These variations (mutation steps) are mostly
small. They will be applied to the variables of the individuals with a low probability (mutation
probability or mutation rate). Normally, offspring are mutated after being created by
recombination.
For the definition of the mutation steps and the mutation rate two approaches exist:
Both parameters are constant during a whole evolutionary run. Examples are
methods for the mutation of real variables, see Section 5.1 and mutation of binary
variables, see Section 5.2.
One or both parameters are adapted according to previous mutations. Examples are
the methods for the adaptation of mutation step-sizes known from the area of
evolutionary strategies, see Section 5.3.
Mutation of real variables means, that randomly created values are added to the variables
with a low probability. Thus, the probability of mutating a variable (mutation rate) and the
size of the changes for each mutated variable (mutation step) must be defined.
Similar results are reported in [Bäc93]and [Bäc96] for a binary valued representation. For
unimodal functions a mutation rate of 1/n was the best choice. An increase in the mutation
rate at the beginning connected with a decrease in the mutation rate to 1/n at the end gave
only an insignificant acceleration of the search.
The given recommendations for the mutation rate are only correct for separable functions.
However, most real world functions are not fully separable. For these functions no
1/5
recommendations for the mutation rate can be given. As long as nothing else is known, a
mutation rate of 1/n is suggested as well.
The size of the mutation step is usually difficult to choose. The optimal step-size depends
on the problem considered and may even vary during the optimization process. It is known,
that small steps (small mutation steps) are often successful, especially when the individual
is already well adapted. However, larger changes (large mutation steps) can, when
successful, produce good results much quicker. Thus, a good mutation operator should
often produce small step-sizes with a high probability and large step-sizes with a low
probability.
In [MSV93a] and [Müh94] such an operator is proposed (mutation operator of the Breeder
Genetic Algorithm):
(5-1)
This mutation algorithm is able to generate most points in the hyper-cube defined by the
variables of the individual and range of the mutation (the range of mutation is given by the
value of the parameter r and the domain of the variables). Most mutated individuals will be
generated near the individual before mutation. Only some mutated individuals will be far
away from the not mutated individual. That means, the probability of small step-sizes is
greater than that of bigger steps. Figure tries to give an impression of the mutation results
of this mutation operator.
The parameter k (mutation precision) defines indirectly the minimal step-size possible and
the distribution of mutation steps inside the mutation range. The smallest relative mutation
step-size is 2-k , the largest 20 = 1. Thus, the mutation steps are created inside the area [r,
r·2-k ] (r: mutation range). With a mutation precision of k = 16, the smallest mutation step
-16 2/5
possible is r·2-16. Thus, when the variables of an individual are so close to the optimum, a
further improvement is not possible. This can be circumvented by decreasing the mutation
range (restart of the evolutionary run or use of multiple strategies)
Typical values for the parameters of the mutation operator from equation 5-1 are:
(5-2)
For binary valued individuals mutation means the flipping of variable values, because every
variable has only two states. Thus, the size of the mutation step is always 1. For every
individual the variable value to change is chosen (mostly uniform at random). Table shows
an example of a binary mutation for an individual with 11 variables, where variable 4 is
mutated.
Assuming that the above individual decodes a real number in the bounds [1, 10], the effect
of the mutation depends on the actual coding. Table shows the different numbers of the
individual before and after mutation for binary/gray and arithmetic/logarithmic coding.
However, there is no longer a reason to decode real variables into binary variables.
Powerful mutation operators for real variables are available, see the operator in
Section 5.1. The advantages of these operators were shown in some publications (for
instance [Mic94] and [Dav91]).
3/5
5.3 Real valued mutation with adaptation of
step-sizes
For the mutation of real variables exists the possibility to learn the direction and step-size of
successful mutations by adapting these values. These methods are a part of evolutionary
strategies ([Sch81] and [Rec94]) and evolutionary programming ([Fdb95]).
For storing the additional mutation step-sizes and directions additional variables are added
to every individual. The number of these additional variables depends on the number of
variables n and the method. Each step-size corresponds to one additional variable, one
direction to n additional variables. To store n directions n2 additional variables would be
needed.
In addition, for the adaptation of n step-sizes n generations with the calculation of multiple
individuals each are needed. With n step-sizes and one direction (A II) this adaptation takes
2n generations, for n directions (A I) n2 generations.
When looking at the additional storage space required and the time needed for adaptation it
can be derived, that only the first two methods are useful for practical application. Only
these methods achieve an adaptation with acceptable expenditure. The adaptation of n
directions (A I) is currently only applicable to small problems.
The algorithms for these mutation operators will not be described at this stage. Instead, the
interested reader will be directed towards the publications mentioned. An example
implementation is contained in [GEATbx]. Some comments important for the practical use
of these operators will be given in the following paragraphs.
The mutation operators with step-size adaptation need a different setup for the evolutionary
algorithm parameters compared to the other algorithms. The adapting operators employ a
small population. Each of these individuals produces a large number of offspring. Only the
best of the offspring are reinserted into the population. All parents will be replaced. The
selection pressure is 1, because all individuals produce the same number of offspring. No
recombination takes place.
When these mutation operators were used one problem had to be solved: the initial size of
the individual step-sizes. The original publications just give a value of 1. This value is only
suitable for a limited number of artificial test functions and when the domain of all variables
is equal. For practical use the individual initial step-sizes must be defined depending on the
domain of each variable. Further, a problem-specific scaling of the initial step-sizes should
be possible. To achieve this the parameter mutation range r can be used, similar to the real
valued mutation operator.
Typical values for the mutation range of the adapting mutation operators are:
(5-3)
A larger value for the mutation range produces larger initial mutation steps. The offspring
are created far away from the parents. Thus, a rough search is performed at the beginning
of a run. A small value for the mutation range determines a detailed search at the
beginning. Between both extremes the best way to solve the problem at hand must be
selected. If the search is too rough, no adaptation takes place. If the initial step sites are too
small, the search takes extraordinarily long and/or the search gets stuck in the next small
local minimum.
The adapting mutation operators should be especially powerful for the solution of problems
with correlated variables. By the adaptation of step-sizes and directions the correlations
between variables can be learned. Some problems (for instance the Rosenbrock function -
contains a small and curve shaped valley) can be solved very effectively by adapting
mutation operators.
The use of the adapting mutation operators is very difficult (or useless), when the objective
function contains many minima (extrema) or is noisy.
This document is part of version 3.8 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use
with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
5/5
geatbx.com/docu/algindex-05.html
6 Reinsertion
Once the offspring have been produced by selection, recombination and mutation of
individuals from the old population, the fitness of the offspring may be determined. If less
offspring are produced than the size of the original population then to maintain the size of
the original population, the offspring have to be reinserted into the old population. Similarly,
if not all offspring are to be used at each generation or if more offspring are generated than
the size of the old population then a reinsertion scheme must be used to determine which
individuals are to exist in the new population.
The used selection method determines the reinsertion scheme: local reinsertion for local
selection and global reinsertion for all other selection methods.
produce as many offspring as parents and replace all parents by the offspring (pure
reinsertion).
produce less offspring than parents and replace parents uniformly at random (uniform
reinsertion).
produce less offspring than parents and replace the worst parents (elitist reinsertion).
produce more offspring than needed for reinsertion and reinsert only the best
offspring (fitness-based reinsertion).
Pure Reinsertion is the simplest reinsertion scheme. Every individual lives one generation
only. This scheme is used in the simple genetic algorithm. However, it is very likely, that
very good individuals are replaced without producing better offspring and thus, good
information is lost.
1/3
The elitist combined with fitness-based reinsertion prevents this losing of information and is
the recommended method. At each generation, a given number of the least fit parents is
replaced by the same number of the most fit offspring (see figure ). The fitness-based
reinsertion scheme implements a truncation selection between offspring before inserting
them into the population (i.e. before they can participate in the reproduction process). On
the other hand, the best individuals can live for many generations. However, with every
generation some new individuals are inserted. It is not checked whether the parents are
replaced by better or worse offspring.
Because parents may be replaced by offspring with a lower fitness, the average fitness of
the population can decrease. However, if the inserted offspring are extremely bad, they will
be replaced with new offspring in the next generation.
In local selection individuals are selected in a bounded neighborhood. (see Section 3.5).
The reinsertion of offspring takes place in exactly the same neighborhood. Thus, the
locality of the information is preserved.
The used neighborhood structures are the same as in local selection. The parent of an
individual is the first selected parent in this neighborhood.
For the selection of parents to be replaced and for selection of offspring to reinsert the
following schemes are possible:
This document is part of version 3.8 of the GEATbx: Genetic and Evolutionary Algorithm Toolbox for use
with Matlab - www.geatbx.com.
The Genetic and Evolutionary Algorithm Toolbox is not public domain.
2/3
© 1994-2006 Hartmut Pohlheim, All Rights Reserved, ([email protected]).
3/3
perlmonks.org
EA is an umbrella term encompassing ideas used in the more specific fields of genetic programming,
genetic algorithms, and evolutionary computing, so the concepts covered here apply to those fields
as well.
Armed with just these principles, you could implement your own rudimentary (and working) EAs.
You may already have implemented something like this before. However, as they say, the devil's in
the details. It's important to understand how the implementation details affect your EA.
On top of all that, EAs are refreshingly fun to use compared to other forms of analysis, and they
leave plenty of room for creativity. Here are a few examples of real-world applications that I know
about:
Finding circuit layouts that minimize production costs, using graph representations.
Modelling biological principles like cooperation, speciation, specialization.
Finding effective movement strategies for cheap (and dumb) robots.
Creating classifiers and predictors for data sets, often with neural nets.
Generating music/art that satisfies certain aesthetic criteria.
The Basics:
In EAs, we work with a population of individuals, data structures that represent elements of the
problem's solution space. When we talk about representation, we mean the type of internal data
structure the EA uses to store the individuals (array, string, tree, neural net, etc). Representation
does matter, but for the scope of this document's examples, strings and arrays will suffice. As they
are easily available as native data types in any sane language, they are much easier to implement
and conceptualize. The actual encoding of a solution into a data structure is called the individual's
gene. We'll talk a little more about representation later.
Fitness:
If our goal is to find individuals which represent "good" solutions, we should probably be a little more
1/5
specific about what we mean by "good." We must have a way of scoring an individual's effectiveness
for the problem. We call this measurement the individual's fitness (as in survival of the fittest). The
fitness measure should reflect the characteristics you desire in a "good" solution, so the higher an
individual's fitness, the better it demonstrates the traits you want. The fitness measure is always
dependent on the representation you use. It sets your EA's goal.
Commonly, fitness is just a function of the individual's gene data structure. However, the fitness
measure need not be a true function in the mathematical sense. It might be probablistic, or it might
depend also on other members of the population. It also often involves a model or simulation of the
problem, executed with the individuals of the population.
The Process:
The most basic evolutionary algorithm psuedocode is rather simple:
Create an initial population (usually at random)
Until "done": (exit criteria)
Select some pairs to be parents (selection)
Combine pairs of parents to create offspring (recombination)
Perform some mutation(s) on the offspring (mutation)
Select some population members to be replaced by the new offspring (replacement)
Repeat
This is extremely general, so let's look at each step in a little more detail. As you can imagine, there
are a zillion ways to fill in the implementation details of each step, so I'll only list the most common
ones.
The exit criteria sets the target for the fitness measure, but also usually includes an upper limit on
the number of iterations, in case the evolution gets "stuck." A typical exit criteria might be: "stop
when some individual achieves a fitness of 100, or when we have iterated 10,000 times." We'll talk
more about evolution getting "stuck" later. Sticking with the biology jargon, each iteration of the loop
is called a generation.
Selection and replacement grant breeding rights and cause extinction within the population,
respectively. They are independent of the representation scheme, and should only rely on your
choice of fitness measure. Usually a small fraction of the population are chosen for breeding or
replacement each generation. For simplicity, often the same number of individuals are chosen for
breeding and replacement, although this is not required (causing the population to change in size).
Here are a few of the most common selection and replacement methods:
Random: Choose completely at random, with uniform probability given to each individual
(regardless of fitness).
Absolute: Always breed the n best-fit individuals, and replace the n least-fit individuals. (No
randomness, always a deterministic choice)
Roulette: Pick randomly, but with relative weights proportional to fitness. Higher-fit individuals
have a better chance of getting chosen for breeding, and less-fit individuals have a better
chance of getting chosen for replacement
Rank: Same as roulette, but make the relative weights proportional to an individual's rank
within the population, not fitness. The least-fit individual has rank 1, while the most-fit has
rank N (the size of the population).
Selection and replacement methods are independent of each other. You could use absolute
replacement with rank selection, for example.
Recombination (or breeding) is the process of using existing pairs of "parent" genes to produce
new "offspring" genes. The details of this operation depend on your representation scheme, but by
far the most common recombination operation is called crossover. Crossover can be used with
string and array representations. It involves making copies of the parents and then swapping a
chunk between the copies. Here's a visual example on two string genes:
[download]
The concept of crossover can be extended and used in other representations as well. For instance,
a crossover operation on two tree structures might involve the exchange of two subtrees. Common
variations on crossover include swapping chunks from different parts of the two genes or
exchanging more than one chunk.
2/5
Mutation is a random process which slightly modifies the gene of an individual. With string genes, a
mutation usually consists of changing a fixed number of characters, or changing each character with
a very low probability (e.g, a 5% chance of changing each character). Other interesting mutations
include lengthening, shortening, or modifying the gene, each with a respective probability.
use strict;
use List::Util qw/shuffle sum/;
my $str_length = 20;
my $pop_size = 50;
my $generations;
while ( $generations++ < 1000 and fitness($population[-1]) != $str_len
+gth ) {
my @parents = shuffle @population[-10 .. -1];
my @children;
push @children, crossover( splice(@parents, 0, 2) )
while @parents;
#####
sub fitness {
return $_[0] =~ tr/1/1/;
}
sub crossover {
my ($s1, $s2) = @_;
my ($start, $end) = sort {$a <=> $b} map { int rand length $s1 } 1
+ .. 2;
sub mutate {
my $s = shift;
for (0 .. length($s) - 1) {
substr($s, $_, 1) = 1 - substr($s, $_, 1) if rand() < 0.2;
}
return $s;
}
sub rand_string {
join "" => map { rand() > 0.5 ? 0 : 1 } 1 .. $str_length;
}
[download]
Can you pick out which parts of this code correspond to the parts of the pseudocode? What type of
mutation was used (N-point or probabalistic)? What type of selection and replacement scheme were
used? What percentage of the population gets breeding rights at each generation? What is the exit
criteria? How could this code be made more efficient (there are many ways)? How could the EA
process be modularized? How much harder would this have been to write in C or Java? ;)
Now What? How Do I Choose?
3/5
You now probably have a feeling for the wide range of EA building blocks. But there are so many,
how will you choose what's best for a particular problem? What makes them different? It's time for a
little theory...
Fitness Landscapes & Diversity:
One way to think of how EAs solve problems is through hill-climbing. Think of breeding as a
process of exploring the solution space: starting with high-fitness individuals, recombination and
mutation bring new individuals into the population, whose genes are "nearby" the genes of the
parents. Selection and replacement fuel the up-hill part: the new individuals who have a higher
fitness will in turn be explored while the lower ones will eventually be discarded and so on, until you
discover individuals that have the highest fitness of all nearby individuals -- they are at the top of that
particular "hill" of fitness. Notice that "nearness" of other individuals is measured in the number of
mutations and/or recombinations needed to get from here to there. So your choice of mutation and
recombination operators determines the fitness landscape.
On one hand, hill-climbing casues EA populations is to slowly cluster near the tops of these hills as
they try to achieve maximum fitness. When most of the population's members are very close to one
another (very few mutations or crossovers apart), their genes are very similar, they have much
genetic material in common, and we say the population is not diverse. Hill-climbing is desired (we do
want to maximize fitness after all), but only in moderation. If it happens too fast, it's easy for the
whole population may become "stuck" on a small number of fitness hills that are not the highest in
the solution space. Mathematically speaking, these are local optima.
On the other hand, when the population is diverse and spread out in the landscape, you may
combine two "distant" parents to get a child somewhere in the middle, maybe on a new fitness hill.
This allows for more fitness hills to be discovered, reducing the chance of getting stuck on a local
optima.
(You may have noticed that in the ONE-MAX example, there are none of these. There's only one
fitness hill, with the string of all 1s at the top. Its fitness landscape is a 20-dimensional hypercube.
Mutation moves along one or more edges of the cube, and crossover moves to any vertex along the
subcube induced by the parents. Non-trivial problems generally have fitness landscapes that are too
complex to characterize.)
Population diversity is needed to ensure a many fitness hills are encountered. But eventually
diversity must be sacrificed so that good solutions can "climb the hill" to maximize fitness.
Representation Matters, Too!
Mutation and recombination (and therefore the fitness landscape) rely on your choice of
representation scheme. The representation should therefore make mutation and recombination
behave like the biological concepts they represent. For instance, a small change in an individual's
gene should make only a small to moderate change in its fitness characterstics. Likewise, combining
parts of the genes of two individuals should produce an individual that shares some of its parents'
characterstics. However, the result need not be merely an average of the parents; there may be
synergy between different parts of the genes.
4/5
In solving difficult problems with EAs, finding a good representation scheme with good
recombination and mutation operations can often be the hardest piece of the puzzle. There is no
magic advice for choosing the "right" representation, and in addition to adhering to these guidelines,
the choice must be feasible to implement.
blokhead
Back to Meditations
5/5
Evolution of a salesman: A complete genetic algorithm
tutorial for Python
towardsdatascience.com/evolution-of-a-salesman-a-complete-genetic-algorithm-tutorial-for-python-6fe5d2b3ca35
Introduction
The problem
In this tutorial, we’ll be using a GA to find a solution to the traveling salesman problem
(TSP). The TSP is described as follows:
“Given a list of cities and the distances between each pair of cities, what is the shortest
possible route that visits each city and returns to the origin city?”
1/7
Illustration of a potential solution to the TSP (By Xypron [Public domain], from Wikimedia
Commons)
The approach
Let’s start with a few definitions, rephrased in the context of the TSP:
2. Determine fitness
4. Breed
5. Mutate
6. Repeat
We’ll also create a Fitness class. In our case, we’ll treat the fitness as the inverse of the
route distance. We want to minimize route distance, so a larger fitness score is better.
Based on Rule #2, we need to start and end at the same place, so this extra calculation is
accounted for in line 13 of the distance calculation.
We now can make our initial population (aka first generation). To do so, we need a way to
create a function that produces routes that satisfy our conditions (Note: we’ll create our list
of cities when we actually run the GA at the end of the tutorial). To create an individual, we
randomly select the order in which we visit each city:
This produces one individual, but we want a full population, so let’s do that in our next
function. This is as simple as looping through the createRoute function until we have as
many routes as we want for our population.
Note: we only have to use these functions to create the initial population. Subsequent
generations will be produced through breeding and mutation.
Determine fitness
Next, the evolutionary fun begins. To simulate our “survival of the fittest”, we can make use
of Fitness to rank each individual in the population. Our output will be an ordered list with
the route IDs and each associated fitness score.
3/7
Select the mating pool
There are a few options for how to select the parents that will be used to create the next
generation. The most common approaches are either fitness proportionate selection
(aka “roulette wheel selection”) or tournament selection:
Another design feature to consider is the use of elitism. With elitism, the best performing
individuals from the population will automatically carry over to the next generation, ensuring
that the most successful individuals persist.
For the purpose of clarity, we’ll create the mating pool in two steps. First, we’ll use the
output from rankRoutes to determine which routes to select in our selection function.
In lines 3–5, we set up the roulette wheel by calculating a relative fitness weight for each
individual. In line 9, we compare a randomly drawn number to these weights to select our
mating pool. We’ll also want to hold on to our best routes, so we introduce elitism in line 7.
Ultimately, the selection function returns a list of route IDs, which we can use to create
the mating pool in the matingPool function.
Now that we have the IDs of the routes that will make up our mating pool from the
selection function, we can create the mating pool. We’re simply extracting the selected
individuals from our population.
Breed
With our mating pool created, we can create the next generation in a process called
crossover (aka “breeding”). If our individuals were strings of 0s and 1s and our two rules
didn’t apply (e.g., imagine we were deciding whether or not to include a stock in a portfolio),
we could simply pick a crossover point and splice the two strings together to produce an
offspring.
However, the TSP is unique in that we need to include all locations exactly one time. To
abide by this rule, we can use a special breeding function called ordered crossover. In
ordered crossover, we randomly select a subset of the first parent string (see line 12 in
breed function below) and then fill the remainder of the route with the genes from the
second parent in the order in which they appear, without duplicating any genes in the
selected subset from the first parent (see line 15 in breed function below).
4/7
Illustration of ordered crossover (credit: Lee Jacobson)
Next, we’ll generalize this to create our offspring population. In line 5, we use elitism to
retain the best routes from the current population. Then, in line 8, we use the breed
function to fill out the rest of the next generation.
Mutate
However, since we need to abide by our rules, we can’t drop cities. Instead, we’ll use swap
mutation. This means that, with specified low probability, two cities will swap places in our
route. We’ll do this for one individual in our mutate function:
Next, we can extend the mutate function to run through the new population.
Repeat
We’re almost there. Let’s pull these pieces together to create a function that produces a
new generation. First, we rank the routes in the current generation using rankRoutes . We
then determine our potential parents by running the selection function, which allows us
to create the mating pool using the matingPool function. Finally, we then create our new
generation using the breedPopulation function and then applying mutation using the
mutatePopulation function.
5/7
Evolution in motion
We finally have all the pieces in place to create our GA! All we need to do is create the
initial population, and then we can loop through as many generations as we desire. Of
course we also want to see the best route and how much we’ve improved, so we capture
the initial distance in line 3 (remember, distance is the inverse of the fitness), the final
distance in line 8, and the best route in line 9.
First, we need a list of cities to travel between. For this demonstration, we’ll create a list of
25 random cities (a seemingly small number of cities, but brute force would have to test
over 300 sextillion routes!):
Then, running the genetic algorithm is one simple line of code. This is where art meets
science; you should see which assumptions work best for you. In this example, we have
100 individuals in each generation, keep 20 elite individuals, use a 1% mutation rate for a
given gene, and run through 500 generations:
It’s great to know our starting and ending distance and the proposed route, but we would be
remiss not to see how our distance improved over time. With a simple tweak to our
geneticAlgorithm function, we can store the shortest distance from each generation in a
progress list and then plot the results.
Run the GA in the same way as before, but now using the newly created
geneticAlgorithmPlot function:
6/7
Sample output from the geneticAlgorithmPlot function
Conclusion
I hope this was a fun, hands-on way to learn how to build your own GA. Try it for yourself
and see how short of a route you can get. Or go further and try to implement a GA on
another problem set; see how you would change the breed and mutate functions to
handle other types of chromosomes. We’re just scratching the surface here!
7/7