History Genetic Algorithms
History Genetic Algorithms
In 1992 John Koza has used genetic algorithm to evolve programs to perform certain
tasks. He called his method "genetic programming" (GP). LISP programs were used,
because programs in this language can expressed in the form of a "parse tree", which is
the object the GA works on.
Chromosome
All living organisms consist of cells. In each cell there is the same set of chromosomes.
Chromosomes are strings of DNA and serves as a model for the whole organism. A
chromosome consist of genes, blocks of DNA. Each gene encodes a particular protein.
Basically can be said, that each gene encodes a trait, for example color of eyes. Possible
settings for a trait (e.g. blue, brown) are called alleles. Each gene has its own position in
the chromosome. This position is called locus.
Complete set of genetic material (all chromosomes) is called genome. Particular set of
genes in genome is called genotype. The genotype is with later development after birth
base for the organism's phenotype, its physical and mental characteristics, such as eye
color, intelligence etc.
Reproduction
During reproduction, first occurs recombination (or crossover). Genes from parents
form in some way the whole new chromosome. The new created offspring can then be
mutated. Mutation means, that the elements of DNA are a bit changed. This changes are
mainly caused by errors in copying genes from parents.
Search Space
If we are solving some problem, we are usually looking for some solution, which will be
the best among others. The space of all feasible solutions (it means objects among those
the desired solution is) is called search space (also state space). Each point in the search
space represent one feasible solution. Each feasible solution can be "marked" by its value
or fitness for the problem. We are looking for our solution, which is one point (or more)
among feasible solutions - that is one point in the search space.
The looking for a solution is then equal to a looking for some extreme (minimum or
maximum) in the search space. The search space can be whole known by the time of
solving a problem, but usually we know only a few points from it and we are generating
other points as the process of finding solution continues.
The problem is that the search can be very complicated. One does not know where to
look for the solution and where to start. There are many methods, how to find some
suitable solution (ie. not necessarily the best solution), for example hill climbing, tabu
search, simulated annealing and genetic algorithm. The solution found by this methods
is often considered as a good solution, because it is not often possible to prove what is the
real optimum.
NP-hard Problems
Example of difficult problems, which cannot be solved int "traditional" way, are
NP problems.
There are many tasks for which we know fast (polynomial) algorithms. There are also
some problems that are not possible to be solved algorithmicaly. For some problems was
proved that they are not solvable in polynomial time.
But there are many important tasks, for which it is very difficult to find a solution, but
once we have it, it is easy to check the solution. This fact led to NP-complete problems.
NP stands for nondeterministic polynomial and it means that it is possible to "guess" the
solution (by some nondeterministic algorithm) and then check it, both in polynomial
time. If we had a machine that can guess, we would be able to find a solution in some
reasonable time.
Studying of NP-complete problems is for simplicity restricted to the problems, where the
answer can be yes or no. Because there are tasks with complicated outputs, a class of
problems called NP-hard problems has been introduced. This class is not as limited as
class of NP-complete problems.
Today nobody knows if some faster exact algorithm exists. Proving or disproving this
remains as a big task for new researchers (and maybe you! :-)). Today many people think,
that such an algorithm does not exist and so they are looking for some alternative
methods - example of these methods are genetic algorithms.
Basic Description
Example
As you already know from the chapter about search space, problem solving can
be often expressed as looking for extreme of a function. This is exactly what the
problem shown here is. Some function is given and GA tries to find minimum of
the function.
You can try to run genetic algorithm at the following applet by pressing button
Start. Graph represents some search space and vertical lines represent solutions
(points in search space). The red line is the best solution, green lines are the other
ones.
Button Start starts the algorithm, Step performs one step (i.e. forming one new
generation), Stop stops the algorithm and Reset resets the population.
Outline of the Basic Genetic Algorithm
Some Comments
As you can see, the outline of Basic GA is very general. There are many things that can
be implemented differently in various problems.
First question is how to create chromosomes, what type of encoding choose. With this is
connected crossover and mutation, the two basic operators of GA. Encoding, crossover
and mutation are introduced in next chapter.
Next questions is how to select parents for crossover. This can be done in many ways, but
the main idea is to select the better parents (in hope that the better parents will produce
better offspring). Also you may think, that making new population only by new offspring
can cause lost of the best chromosome from the last population. This is true, so so called
elitism is often used. This means, that at least one best solution is copied without changes
to a new population, so the best solution found can survive to end of run.
V. Operators of GA
Overview
As you can see from the genetic algorithm outline, the crossover and mutation are the
most important part of the genetic algorithm. The performance is influenced mainly by
these two operators. Before we can explain more about crossover and mutation, some
information about chromosomes will be given.
Encoding of a Chromosome
The chromosome should in some way contain information about solution which it
represents. The most used way of encoding is a binary string. The chromosome then
could look like this:
Chromosome 1 1101100100110110
Chromosome 2 1101111000011110
Each chromosome has one binary string. Each bit in this string can represent some
characteristic of the solution. Or the whole string can represent a number - this has been
used in the basic GA applet.
Of course, there are many other ways of encoding. This depends mainly on the solved
problem. For example, one can encode directly integer or real numbers, sometimes it is
useful to encode some permutations and so on.
Crossover
After we have decided what encoding we will use, we can make a step to crossover.
Crossover selects genes from parent chromosomes and creates a new offspring. The
simplest way how to do this is to choose randomly some crossover point and everything
before this point point copy from a first parent and then everything after a crossover point
copy from the second parent.
Mutation
After a crossover is performed, mutation take place. This is to prevent falling all solutions
in population into a local optimum of solved problem. Mutation changes randomly the
new offspring. For binary encoding we can switch a few randomly chosen bits from 1 to
0 or from 0 to 1. Mutation can then be following:
The mutation depends on the encoding as well as the crossover. For example when we are
encoding permutations, mutation could be exchanging two genes.
VI. GA Example
Minimum of Function
About the Problem
As you already know from the chapter about search space, problem solving can be often
expressed as looking for extreme of a function. This is exactly what the problem shown
here is.
Example
You can try to run genetic algorithm at the following applet by pressing button Start.
Graph represents some search space and vertical lines represent solutions (points in
search space). The red line is the best solution, green lines are the other ones. Above the
graph are displayed old and new population. Each population consists of binary
chromosomes - red and blue point means zeros and ones. On the applet you can see
process of forming the new population in steps.
Button Start starts the algorithm, Step performs one step (i.e. forming one new
generation), Stop stops the algorithm and Reset resets the population.
We suggest you to start with pressing button Step and watching how GA works in details.
The outline of GA has been introduced in one of the previous chapters. First you can see
elitism and then forming new offspring by crossover and mutation until a new population
is completed
VII. Parameters of GA
Crossover and Mutation Probability
There are two basic parameters of GA - crossover probability and mutation probability.
Mutation probability says how often will be parts of chromosome mutated. If there is
no mutation, offspring is taken after crossover (or copy) without any change. If mutation
is performed, part of chromosome is changed. If mutation probability is 100%, whole
chromosome is changed, if it is 0%, nothing is changed.
Mutation is made to prevent falling GA into local extreme, but it should not occur very
often, because then GA will in fact change to random search.
Other Parameters
There are also some other parameters of GA. One also important parameter is population
size.
Population size says how many chromosomes are in population (in one generation). If
there are too few chromosomes, GA have a few possibilities to perform crossover and
only a small part of search space is explored. On the other hand, if there are too many
chromosomes, GA slows down. Research shows that after some limit (which depends
mainly on encoding and the problem) it is not useful to increase population size, because
it does not make solving the problem faster.
Some recommendations for all parameters can be found in one of the following chapters.
Example
Here you can see example similar to previous one. But here you can try to change
crossover and mutation probability. You can also control elitism.
On the graph below you can see performance of GA. Red is the best solution, blue
is average value (fitness) of all population.
Try to change parameters and look how GA behaves.
The problem is again the same - looking for extreme of a function. But here you can
define your own 2D function.
Example
Graph represents search space and lines represent solutions (points in search space). The
red line is the best solution, blue lines are the other ones.
You can enter your own function in a text field below graph (after change press enter or
button Change). Below it you can define limits of function. Function can consist of x, y,
pi, e, (, ), /, *, +, -, !, ^ and functions abs, acos, acosh, asin, asinh, atan, atanh, cos, cosh,
ln, log, sin, sinh, sqr, sqrt, tan and tanh.
The graph can be rotated by dragging mouse over it.
You can also change crossover and mutation probability. Checkboxes control elitism and
if it is looked for minimum or maximum.
Try to change the function and have a look how GA works.
IX. Selection
Introduction
As you already know from the GA outline, chromosomes are selected from the
population to be parents to crossover. The problem is how to select these chromosomes.
According to Darwin's evolution theory the best ones should survive and create new
offspring. There are many methods how to select the best chromosomes, for example
roulette wheel selection, Boltzman selection, tournament selection, rank selection, steady
state selection and some others.
Parents are selected according to their fitness. The better the chromosomes are, the more
chances to be selected they have. Imagine a roulette wheel where are placed all
chromosomes in the population, every one has its place accordingly to its fitness
function, like on the following picture.
Then a marble is thrown there and selects the chromosome. Chromosome with bigger
fitness will be selected more times.
Rank Selection
The previous selection will have problems when the fitnesses differs very much. For
example, if the best chromosome fitness is 90% of all the roulette wheel then the other
chromosomes will have very few chances to be selected.
Rank selection first ranks the population and then every chromosome receives fitness
from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will
have fitness N (number of chromosomes in population).
You can see in following picture, how the situation changes after changing fitness to
order number.
After this all the chromosomes have a chance to be selected. But this method can lead to
slower convergence, because the best chromosomes do not differ so much from other
ones.
Steady-State Selection
This is not particular method of selecting parents. Main idea of this selection is that big
part of chromosomes should survive to next generation.
GA then works in a following way. In every generation are selected a few (good - with
high fitness) chromosomes for creating a new offspring. Then some (bad - with low
fitness) chromosomes are removed and the new offspring is placed in their place. The rest
of population survives to new generation.
Elitism
Idea of elitism has been already introduced. When creating new population by crossover
and mutation, we have a big chance, that we will loose the best chromosome.
Elitism is name of method, which first copies the best chromosome (or a few best
chromosomes) to new population. The rest is done in classical way. Elitism can very
rapidly increase performance of GA, because it prevents losing the best found solution.
X. Encoding
Introduction
Encoding of chromosomes is one of the problems, when you are starting to solve problem
with GA. Encoding very depends on the problem.
In this chapter will be introduced some encodings, which have been already used with
some success.
Binary Encoding
Binary encoding is the most common, mainly because first works about GA used this
type of encoding.
Chromosome A 101100101100101011100101
Chromosome B 111111100000110000011111
Binary encoding gives many possible chromosomes even with a small number of alleles.
On the other hand, this encoding is often not natural for many problems and sometimes
corrections must be made after crossover and/or mutation.
Permutation Encoding
Chromosome A 1 5 3 2 6 4 7 9 8
Chromosome B 8 5 6 7 2 3 1 4 9
Permutation encoding is only useful for ordering problems. Even for this problems for
some types of crossover and mutation corrections must be made to leave the chromosome
consistent (i.e. have real sequence in it).
Value Encoding
Direct value encoding can be used in problems, where some complicated value, such as
real numbers, are used. Use of binary encoding for this type of problems would be very
difficult.
In value encoding, every chromosome is a string of some values. Values can be anything
connected to problem, form numbers, real numbers or chars to some complicated objects.
Value encoding is very good for some special problems. On the other hand, for this
encoding is often necessary to develop some new crossover and mutation specific for the
problem.
Tree Encoding
Tree encoding is used mainly for evolving programs or expressions, for genetic
programming.
Chromosome A Chromosome B
Crossover and mutation are two basic operators of GA. Performance of GA very depends
on them. Type and implementation of operators depends on encoding and also on a
problem.
There are many ways how to do crossover and mutation. In this chapter are only some
examples and suggestions how to do it for several encoding.
Binary Encoding
Crossover
Single point crossover - one crossover point is selected, binary string from
beginning of chromosome to the crossover point is copied from one parent, the
rest is copied from the second parent
11001011+11011111 = 11001111
Two point crossover - two crossover point are selected, binary string from
beginning of chromosome to the first crossover point is copied from one parent,
the part from the first to the second crossover point is copied from the second
parent and the rest is copied from the first parent
Uniform crossover - bits are randomly copied from the first or from the second
parent
Mutation
Permutation Encoding
Crossover
Single point crossover - one crossover point is selected, till this point the
permutation is copied from the first parent, then the second parent is scanned and
if the number is not yet in the offspring it is added
Note: there are more ways how to produce the rest after crossover point
(1 2 3 4 5 6 7 8 9) + (4 5 3 6 8 9 7 2 1) = (1 2 3 4 5 6 8 9 7)
Mutation
(1 2 3 4 5 6 8 9 7) => (1 8 3 4 5 6 2 9 7)
Value Encoding
Crossover
Mutation
Adding a small number (for real value encoding) - to selected values is added (or
subtracted) a small number
(1.29 5.68 2.86 4.11 5.55) => (1.29 5.68 2.73 4.22 5.55)
Tree Encoding
Crossover
Tree crossover - in both parent one crossover point is selected, parents are
divided in that point and exchange part below crossover point to produce new
offspring
Mutation
Changing operator, number - selected nodes are changed
Travelling salesman problem (TSP) has been already mentioned in one of the previous
chapters. To repeat it, there are cities and given distances between them.Travelling
salesman has to visit all of them, but he does not to travel very much. Task is to find a
sequence of cities to minimize travelled distance. In other words, find a minimal
Hamiltonian tour in a complete graph of N nodes.
Implementation
You can select crossover and mutation type. I will describe what they mean.
Crossover
Mutation
Example
Following applet shows GA on TSP. Button "Change View" changes view from whole
population to best solution and vice versa. You can add and remove cities by clicking on
the graph. After adding or deleting random tour will appear because of creating new
population with new chromosomes. Also note that we are solving TSP on complete
graph.
Try to run GA with different crossover and mutation and note how the performance (and
speed - add more cities to see it) of GA changes.
XIII. Recommendations
Parameters of GA
This chapter should give you some basic recommendations if you have decided to
implement your genetic algorithm. These recommendations are very general. Probably
you will want to experiment with your own GA for specific problem, because today there
is no general theory which would describe parameters of GA for any problem.
Recommendations are often results of some empiric studies of GAs, which were often
performed only on binary encoding.
Crossover rate
Crossover rate generally should be high, about
80%-95%. (However some results show that for
some problems crossover rate about 60% is the
best.)
Mutation rate
On the other side, mutation rate should be very low.
Best rates reported are about 0.5%-1%.
Population size
It may be surprising, that very big population size
usually does not improve performance of GA (in
meaning of speed of finding solution). Good
population size is about 20-30, however sometimes
sizes 50-100 are reported as best. Some research
also shows, that best population size depends on
encoding, on size of encoded string. It means, if
you have chromosome with 32 bits, the population
should be say 32, but surely two times more than
the best population size for chromosome with 16
bits.
Selection
Basic roulette wheel selection can be used, but
sometimes rank selection can be better. Check
chapter about selection for advantages and
disadvantages. There are also some more
sophisticated method, which changes parameters of
selection during run of GA. Basically they behaves
like simulated annealing. But surely elitism should
be used (if you do not use other method for saving
the best found solution). You can also try steady
state selection.
Encoding
Encoding depends on the problem and also on the
size of instance of the problem. Check chapter
about encoding for some suggestions or look to
other resources.
Crossover and mutation type
Operators depend on encoding and on the problem.
Check chapter about operators for some
suggestions. You can also check other sites.
Applications of GA
Genetic algorithms has been used for difficult problems (such as NP-hard problems), for
machine learning and also for evolving simple programs. They have been also used for
some art, for evolving pictures and music.
They are also easy to implement. Once you have some GA, you just have to write new
chromosome (just one object) to solve another problem. With the same encoding you just
change the fitness function and it is all.On the other hand, choosing encoding and fitness
function can be difficult.
Disadvantage of GAs is in their computational time. They can be slower than some other
methods. But with todays computers it is not so big problem.
To get an idea about problems solved by GA, here is a short list of some applications: