Genetic Algorithms: Tutorial ICSB 2007
Genetic Algorithms: Tutorial ICSB 2007
Genetic Algorithms
Overview
Fundamentals
Genetic algorithms
Modelling Gene Regulatory Networks (GRNs) Evolution of biological clocks with GRNs Evolution in NetBuilder
STRI, University of Hertfordshire 2
Part 1: Fundamentals
Biological Evolution
Biological Evolution
Evolution = change in the gene pool of a population over time Gene = hereditary unit - can be passed on unaltered for many generations Gene pool = set of all genes in a species or population
Two color morphs: light and dark 1848 dark moths <= 2% of the population Frequency of the dark morph increased:
industrialized areas
1898 95% were dark in Manchester and other highly Soot from factories darkened birch trees the moths landed on Birds could see the ligther colored moths better and
ate more of them more dark moths survived
STRI, University of Hertfordshire 5
http://www.talkorigins.org/faqs/faq-intro-to-biology.html
Natural selection:
Favors those species for further survival and evolution that are best adapted to their environment see English moth
Population is evolving ratio of different genetic types is changing and new types are created
Darwin:
Replacement
Offspring
7
Dictionary 1
Gene smallest unit with genetic information Genotype collectivity of all genes Phenotype expression of genotype in environment Individual single member of a population with genotype and phenotype Population set of several individuals Generation one iteration of evaluation, selection and reproduction with variation
STRI, University of Hertfordshire 8
Selection does not act on genotype at all but on the performance of the phenotype (fitness) There is differential reproduction phenotypes better adapted to the environment are likely to produce more offspring Slightly unfaithful reproduction creates genotypic variations affect traits of the phenotype, which in turn affect fitness These genotypic variations are heritable
Recombination (crossover)
Choose two individuals from current population parents New combination of the genetic material of these individuals offspring No new genetic information, only reshuffling of existing information But can have strong effects on phenotype
STRI, University of Hertfordshire
10
Duplication
http://en.wikipedia.org/wiki/Gene_duplication 11
Any doubling of a certain region, e.g. through unequal recombination If this region consists of a gene, it is called gene-duplication
Mutation
http://www.biocrawler.com/encyclopedia/Mutation
Selection and differential reproduction DECREASE diversity in population Genetic operators (mutation, recombination) INCREASE diversity of population
13
14
Evolutionary Computation
Evolutionary Computation
Exploitation of concepts of natural evolution for problem solving using computers Simulation of evolutionary processes (recombination, mutation, selection) for solving a desired problem Particularly well-suited to complex, multidimensional problems too big to search exhaustively (non-linear optimization problems) Cannot solve all problems perfectly, but has fewer restrictions than most problem-solving algorithms
STRI, University of Hertfordshire 16
Optimization
Finding the best solution to a problem Mathematically: finding the minimum or maximum of a function (optimum)
Maximum
Minimum
STRI, University of Hertfordshire 17
Optimization - Problems
Example: hill-climbing
Start with estimate of global maximum Try to improve by finding other solutions that have
a greater value than the current estimate (local search)
18
Optimization - Problems
Example: hill-climbing
Start with estimate of global maximum Try to improve by finding other solutions that have
a greater value than the current estimate (local search)
19
Optimization - Problems
Example: hill-climbing
Start with estimate of global maximum Try to improve by finding other solutions that have
a greater value than the current estimate (local search) Global
Local maximum maximum
Evaluation Population
End? Selection
Replacement
STRI, University of Hertfordshire
Dictionary 2
Individual - one candidate solution Population - set of individuals Genotype - encoded representation of individual Phenotype - decoded representation of individual Mapping - decodes the phenotype Mutation - variability operator that modifies a genotype Recombination/Crossover - variability operator mixing genotypes Fitness - performance of a phenotype with regard to objective Iteration - Generation
STRI, University of Hertfordshire 22
EC - General properties
Exploit collective learning process of a population (each individual = one solution = one search point) Evaluation of individuals in their environment = measure of quality = fitness comparison of individuals Selection favors better individuals who reproduce more often than those that are worse Offspring is generated by random recombination and mutation of selected parents
STRI, University of Hertfordshire 23
Main trends
Genetic algorithms (GAs) Genetic programming (subform of GAs) Evolutionary strategies (ES) Evolutionary programming (EP)
24
Genetic Algorithms
f(x) = x Range [0, 31] Goal: find max (31 = 961) Binary representation: string length 5 = 32 numbers (0-31)
= f(x)
STRI, University of Hertfordshire 27
String 2
String 3 String 4
00011
01010 10101
3
10 21
9
100 441
String 5
00001
28
F(x) = x - Selection
binary String 1 00110 value 6 fitness 36
String 2
String 3 String 4
00011
01010 10101
3
10 21
9
100 441
String 5
00001
29
F(x) = x - Selection
binary String 1 00110 value 6 fitness 36
String 2
String 3 String 4
00011
01010 10101
3
10 21
9
100 441
String 5
00001
F(x) = x - Selection
binary String 1 00110 value 6 fitness 36
String 2
String 3 String 4
00011
01010 10101
3
10 21
9
100 441
String 5
00001
31
F(x) = x - Recombination
String 1 String 3
x-position 4 2
String 1:
0 0 1 1 0
0 0 0 1 1
0 0 1 1 1 0 0 0 1 0
String 2:
32
F(x) = x - Recombination
String 1 String 3
x-position 4 2
String 3:
0 1 0 1 0
1 0 1 0 1
0 1 1 0 1 1 0 0 1 0
String 4:
33
F(x) = x - Mutation
bit-flip:
34
F(x) = x
All individuals in the parent population are replaced by offspring in the new generation
(generations are discrete!)
F(x) = x - End
Number of generations Best fitness Process time No improvements after a number of generations Best individual: 10111 (23) fitness 529
STRI, University of Hertfordshire 36
F(x) = x - Animation
37
GAs - General
Genetic algorithms
39
History
40
Genetic coding
Finite strings (= genome, represents genotype) Strings consists of units with information (unit = gene) One string ( individual) = one possible solution of the problem Genotype often real numbers or bit string:
1.853
0.492
42
What should the phenotype look like and how to encode it as a genotype? How does one map from genotype to phenotype, considering the sources of variation (mutation and recombination)?
Highly problem dependent! Hint: small changes to genotype should often result in small changes to phenotype, i.e. similar performance: heritability of traits! heritability of traits is important otherwise GA becomes only random search
STRI, University of Hertfordshire 43
Mapping Example
Number of bits that have to be changed to map one string into another one E.g. 000 and 001 distance = 1
44
46
Hamming distance:
Two neighboring numbers (phenotypes) have always a genotype distance of 1 (all differ only by one bit flip) OPTIMAL mapping
STRI, University of Hertfordshire 48
Binary:
Gray:
49
Selection
Selection
Selection of individuals for differential reproduction of offspring in next generation Favors better solutions Decreases diversity in population
51
Selection - Roulette-Wheel
Each solution gets a region on a roulette wheel according to its fitness Spun wheel, select solution marked by roulette-wheel pointer stochastic selection (better fitness = higher chance of reproduction)
http://www.edc.ncl.ac.uk/highlight/rhjanuary2007g02.php
52
Selection - Elitism
Selection based on fitness values Keep the best individual of current population unrealistic but ensures best fitness of a
generation never decreases decrease of diversity
STRI, University of Hertfordshire 53
Selection - Tournament
randomly select q individuals from current population Winner: individual(s) with best fitness among these q individuals Example:
54
Mutation
Goal: search around existing good solution, possibly leave local optima
1.853 1.807
56
Recombination/Crossover
Usually explorative Creates new strings by combining parts of two existing strings
Parents:
0 1 1
1 1 0
Offspring:
0
57
Recombination
Unequal:
Crossover points independent for each string chosen
Parents:
0 1 1
1 1 0
Offspring:
1
58
Fitness
Fitness function
Nature:
60
0010
0001
0011
1100
1000
1001
How we move in that landscape over generations is defined by our variability operators, usually mutation and recombination Now add fitness
STRI, University of Hertfordshire 61
0010
0001
0011
0100
0000
1100
1000
1001
How we move in that landscape over generations is defined by our variability operators, usually mutation and recombination Now add fitness
STRI, University of Hertfordshire 62
Every snowflake one individual, search focuses on promising regions (due to differential reproduction)
STRI, University of Hertfordshire 63
x/y axes: kinship, i.e. the more genetic resemblance the closer together z axis: fitness
Easy to find the optimum by local search neighboring genotypes have similar fitness (smooth curve high evolvability)
Fitness
Genotypes
STRI, University of Hertfordshire 64
Here we will have a hard time finding the optimum Low evolvability (fitness is right/wrong) Either problem not well suited for GA or bad design
Fitness
Genotypes
STRI, University of Hertfordshire 65
Many local optima, so we are likely to find one However not much of a gradient to find global optimum, random search could do as well
Fitness
Genotypes
STRI, University of Hertfordshire 66
Fitness does not need to be static over generations Can allow to reach regions otherwise uncovered Natural fitness certainly very dynamic
STRI, University of Hertfordshire
Design issues
Useful e.g. if constraints on range of solutions Possible problems: Loss of diversity and bias
STRI, University of Hertfordshire 69
Design decisions
Problem representation Genetic operators with parameters Mechanism of selection Size of the population Fitness function
Decisions are highly problem dependent Parameters not independent, you cannot optimize them one by one
STRI, University of Hertfordshire 70
Mutation, recombination:
create indiviuals that are in new regions (diversity!!) fine tuning in current regions
STRI, University of Hertfordshire 71
Keep in mind
Start population has a lot of diversity Invest search time in areas that have proven good in the past Loss of diversity over evolutionary time Premature convergence: quick loss of diversity poses high risk of getting stuck in local optima Evolvability:
Fitness landscape should not be too rugged Heredity of traits Small genetic changes should be mapped to small phenotype changes
72
Wrapping up Part 1
GA- Summary
Selection:
Focus on fittest individuals Adds alternative solutions to population Makes sure that most of the search space is reached
Recombination:
Mutation:
74
Advantages: Basic method simple and broadly applicable No need for very detailed understandung of the problem But can be adjusted to problem if knowledge present Fast and can be scheduled in parallel Disadvantages: No guarantee to find best solution High computational demands Adapting to problems at hand can be hard, e.g. finding suitable representation/mapping and evolutionary operators Search can get caught in local optima
STRI, University of Hertfordshire 75
Populations are spatial, e.g. for speciation interaction (mating, competition) localized to maintain diversity Populations have structure, e.g. niche protection competition will be stronger if many individuals do the same to maintain diversity Diploidy with dominance / recessivity N-point crossover and other variants Morphogenesis instead of simple function mapping (allowing for modularity, making crossover less fatal)
STRI, University of Hertfordshire 76
77
Population Population
Replacement
Offspring Offspring
STRI, University of Hertfordshire 78
Population Population
Replacement
Offspring Offspring
STRI, University of Hertfordshire 79
Selection
Selection based on comparison of individual phenotypes with target phenotype For phenotype read: input/output system
parameter sets Each parameter set is used to build a different input/output system
80
Decoding
info string 1
info string 2
info string 3
82
Decoding
info string 1
System 1
info string 2
System 2
info string 3
System 3
83
Evaluating
Input
84
Comparing
Output 1
Output 2
Compare wrt
Target
Output 3
85
info string 1
System 1 Output 1
info string 2
System 2 Output 2
Compare wrt
Target
info string 3
System 3 Output 3
86
Decoding rules
Example F(x) = x2
Info
Rules 27
System f(x)
Input -
Output 729
Target Max
11011
87
Decoding rules
Info
Input -
Output 729
Target Max
11011
Connectivity; parameter values
88
The basic control networks that underlie the development and responses of organisms Involve interactions between genes, RNA, proteins Input output transformation systems Built using concepts taken from GRN-theory assumptions on how GRNs work As (very crude) models of the real thing Interest in computational potential
Why aGRNs?
90
DNA
(Transcription)
RNA
(Translation)
Protein
(Reverse transcription)
Information flow
91
DNA
(Transcription)
RNA
(Translation)
Protein
(Reverse transcription)
Information flow
92
GRN Control
DNA
Coding region Non-coding region
94
Upstream region: Often contains interaction points for Transcription Factors: proteins that repress or activate transcription
95
96
No limit to number of activators or repressors Protein that acts as an activator for one gene may act as a repressor for another
97
GRN Dynamics
mRNA
mRNA
breakdown
100
Symbols: Nodes
State, Store, Place,
SBML: Species
Transition, Process,
SBML: Reaction
101
Symbols: Arrows
SBML: ModifierSpecies
SBML: Reactant
Input,
SBML: Product
Output,
102
mRNA production
103
mRNA production
104
mRNA production
105
mRNA breakdown
106
mRNA breakdown
107
mRNA breakdown
108
No activator, no production
109
No activator, no production
110
(SBML: KineticLaw; transfer function, ) Shape of rate equation depends on process (and how much we know about it)
111
Dynamics
Messenger RNA and protein product dynamics
mRNA
protei n
112
Dynamics
Messenger RNA and protein product dynamics
protein
113
Amount will reach plateau (usually; there are In general: breakdown processes determine
Aim:
Model as realistic as possible? Computational tool? Control network? Genes (if so: what do they represent)? Transcription factors? Other regulators (e.g. regulatory RNA)? Intermediates (e.g. mRNA)? Signals? Transcription? Translation? Breakdown? Signalling?
GRN constituents
Mass-action type rate equations? Saturable? Additive? Logical? Continuous? Discrete multivalued? Boolean (if so: what do 0 and 1 represent)? Continuous? Discrete? Numerical integration of rate equations (if so: stochastic or deterministic)? Finite state automaton? Boolean switching?
Representation of time
Evaluation method
117
Control
Dynamics
STRI, University of Hertfordshire 118
Rate equations
Product production rate:
v k f ( modifiers ) p p
Product breakdown rate:
v [P ] b k b
Plateau value (steady state):
k p [ P ] f ( modifiers ) k b
kp, kb: production and breakdown rate constants [x]: concentration or amount of species x P: gene product
120
Saturable repression:
1 S , R p 1 m [ M ]
Linear activation:
L ,A m [ M ]p
Linear repression:
L ,R p m [ M ]
m: multiplier p: exponent
121
122
No single proper representation method Chosen method depends on aim of study and personal preference However, be aware of:
the way your representation relates to the real thing the type of simplifications that have been made
Tips:
usually, amounts or concentrations do not have negative values simple negative feedback systems are often oscillatory, but the oscillations may well disappear when using smaller time steps, or a less crude representation
123
124
Introduction
Biological clocks abound in all organisms, even the simplest single celled ones like Gonyaulax' (red tide organism): characteristic example of responsiveness of life on earth
Evolvability of Genetic Regulatory Networks (GRNs) as a paradigm for novel computation
125
Schematic drawing
127
... + / - ...
gene-node
binding sites: protein input from other genes module: grouping of inputs any number of binding sites, any number of modules one protein output type and one activation type
STRI, University of Hertfordshire 128
Genome consists of a number of genes plus a compartment coding parameters global to the cell '0'/'1' code, '2' module boundary, '3' gene boundary; used for compartmentalization
encodes one gene-node
0210013
010111021101020011113
gene delimiter
0110011
module boundary
129
Two modules:
1) inhibitory (starting with 0) binding a combination of proteins 5 (101) and 6 (110) 2) activatory cis-module (starting with 1) to which protein 5 (101) will bind.
gene activation type modules 1) and 2) output binding sites b. site protein
010111021101020011113
regulator type junk (ignored bits)
Last three zeros of 11010200 are ignored junk. Will produce protein 7 (111) and is off by default (last bit is 1).
130
Activation
7
131
Activation contd
As protein values are not boolean, AND is actually minimum and OR is a sum, but effect very similar
every gene is either of constituitive (default on, dotted line) or induced (default off, continuous line) type
STRI, University of Hertfordshire 132
5:20
6:1
-
[=1] [-1]
5:20
[=20]
+ [+20]
[f(-1+20)=~125]
activation increases
protein level:
7:+125
STRI, University of Hertfordshire 133
Environmental in-/output
variations with perturbations: +/- noise with std. dev. 0.1 +/- 2x blackout of 20 steps desired behaviors 1) or 3) closeness of match = fitness
STRI, University of Hertfordshire 134
Tournament selection:
Mutation:
Recombination:
children Length of genes might change, number of genes is held constant in these experiments
STRI, University of Hertfordshire 135
Recombination
Unequal crossing-over allows for genomes of varying length, important for varying number of binding sites and modules Unequal crossover point is shifted by an offset Note that offset always stays within the compartment, so all crossover point genes but one are kept intact 0210013
parents
1102103
offset of 3
offspring
0210013
Can easily evolve to show cyclic behavior Genome length and junk length increase on average (average over 10 runs)
137
internalization of (quasi-) periodic behaviour in many cases the more unreliable the input the more this was found
two evolved GRNs getting out of synchrony when stimuli are missing
Evolvability - Heterochronicity
slightly mutated variants
all variants are one or two bit flips away from each other can allow heterochronic control: changes in timing are possible while preserving general dynamics remember: small genotype changes usually should cause small phenotype changes for smooth adaptation
STRI, University of Hertfordshire 139
Differentiation schematic
Individual has two cells with same genome and almost same input, but different behavior required
Same ultimate goal functions (periodic and inverse) but different fitness evaluations over initial generations
Immediate setting:
fitness is final objective from beginning: one cell reproduces phase of input while the other has to produce inverse
gradual setting:
fitness is initially the same phase reproduction from beginning, but target phase shifts for one of the cells over generations changing fitness landscape!
STRI, University of Hertfordshire 141
2) gradual evaluation
Best results similar, but average much better for 2), robustly finds good results
STRI, University of Hertfordshire 142
Part 4: Netbuilder'
143
A tool for construction modelling simulation (stochastic, deterministic, hybrid) evolution future: Analysis of GRNs Uses the Petri-net formalism
Download:
http://strc.herts.ac.uk/bio/maria/Apostrophe/
STRI, University of Hertfordshire 144
NetBuilder NetBuilder
NetBuilder':
completely overhauled version different model visualisation more simulation and analysis methods
STRI, University of Hertfordshire 145
bipartite graph place e.g. proteins (circle) transition e.g. reaction, gene (rectangle) arc - connection between a place and a transition (what is
consumed to produce what)
transition
place
146
NetBuilder - GUI
147
NetBuilder - GUI
Drawing area
148
NetBuilder - GUI
149
NetBuilder - GUI
Model hierarchy
150
NetBuilder - GUI
Tool bar
open
new layer
save
print zoom
copy paste
arc text
transition
151
Evolution in NetBuilder'
Genotype - Phenotype
Gene:
These arcs or nodes in a gene are fixed and cannot be removed by evolutionary operators but attributes can be adjusted
153
Network:
154
Furthermore
Remember:
No limit to number of activators or repressors Protein that acts as an activator for one gene may act as a repressor for another
Mathematical description:
Automatically created by NetBuilder' or users add their own function to each transition
Environmental input:
Each place can have any input function (e.g. sine or step functions like in Johannes' GA)
STRI, University of Hertfordshire 155
Selection
Tournament:
Select randomly 15 networks (default) The two best networks of these 15 are
recombined
Elitism:
Recombination
Two networks recombined A gene and its arcs go into the child Probability: 90% (default)
157
Mutations
158
Fitness
Fitness: Likeness between target function and current results protein
159
Parameters
as adjustable as possible
General parameters:
160
Parameters
as adjustable as possible
General parameters:
Probabilities:
161
Parameters
as adjustable as possible
Probabilities:
General parameters:
Summary NetBuilder'
Easy-to-use GUI
Free and open-source Create network, simulate (and evolve) it
Evolutionary algorithm in test phase now and will be available in NetBuilder soon !!
https://lists.sourceforge.net/lists/listinfo/apostrophe-users
163
Johannes: In Silico Evolution Of Biological Clocks With Genetic Regulatory Networks, F5 Katja: Netbuilder' - A Tool For The Modeling And Simulation Of Genetic Regulatory Networks, G8
164
Acknowledgement
Chrystopher L. Nehaniv
Ralph Gauges Mark Robinson Slides:
http://strc.herts.ac.uk/bio/maria/Apostrophe/
165
Resources
Evolutionary Algorithms
http://www.talkorigins.org/faqs/faq-intro-to-biology.html Evonet flying circus http://evonet.lri.fr/CIRCUS2/node.php The on-line tutorial on evolutionary computation http://www.lcc.uma.es/~ccottap/semEC/ Bck, T, Fogel, D B and Michalewicz, Z, ed.: Evolutionary Computation 1 & 2. Taylor & Francis 2000 Goldberg, D. E. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley 1989 Langdon, W.B. and Poli, R. Foundations of Genetic Programming. Springer 2002
166
References
167