TEC-SS2025-02-foundations-annotations
TEC-SS2025-02-foundations-annotations
Dirk Sudholt
Aims for today
Source: http://fcampelo.github.io/EC-Bestiary/
Theory of Evolutionary Computation (Dirk Sudholt) Foundations History 6 / 28
Which Metaheuristic Paradigm is the Best?
Past views of metaheuristics (see sketch):
Metaheuristics are good across all problems.
Only beaten by specialised algorithms on few problems.
No Free Lunch Theorem [Wolpert and Macready, 1997, Droste et al., 2002]
Consider search algorithms for functions f ∈ F where F is closed under permutations.
Let T (A) be the average number of different search points sampled by A before an
optimum is found (under the uniform distribution on F ). Then for any two search
algorithms A, B we have T (A) = T (B).
No Free Lunch Theorem [Wolpert and Macready, 1997, Droste et al., 2002]
Consider search algorithms for functions f ∈ F where F is closed under permutations.
Let T (A) be the average number of different search points sampled by A before an
optimum is found (under the uniform distribution on F ). Then for any two search
algorithms A, B we have T (A) = T (B).
No Free Lunch Theorem [Wolpert and Macready, 1997, Droste et al., 2002]
Consider search algorithms for functions f ∈ F where F is closed under permutations.
Let T (A) be the average number of different search points sampled by A before an
optimum is found (under the uniform distribution on F ). Then for any two search
algorithms A, B we have T (A) = T (B).
Building-block hypothesis
Crossover is effective because it combines good ’building blocks’.
Building-block hypothesis
Crossover is effective because it combines good ’building blocks’.
Mitchell, Forrest, and Holland [1992]: “We designed a problem with building blocks on
which schema theory predicts: GAs outperform hill climbers.”
Building-block hypothesis
Crossover is effective because it combines good ’building blocks’.
Mitchell, Forrest, and Holland [1992]: “We designed a problem with building blocks on
which schema theory predicts: GAs outperform hill climbers.”
Forrest and Mitchell [1993]: “We ran experiments and found out: hill climbers
outperform GAs.”
Building-block hypothesis
Crossover is effective because it combines good ’building blocks’.
Mitchell, Forrest, and Holland [1992]: “We designed a problem with building blocks on
which schema theory predicts: GAs outperform hill climbers.”
Forrest and Mitchell [1993]: “We ran experiments and found out: hill climbers
outperform GAs.”
Conclusion
Need mathematical rigour – theorems and proofs!
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Schemata Theory 10 / 28
Brief History of Runtime Analysis
From 1997: Ingo Wegener and members of his Chair (Thomas Jansen, Stefan Droste) in
Dortmund
Collaborative Research Centre “Computational Intelligence” (12 years)
From 1997: Ingo Wegener and members of his Chair (Thomas Jansen, Stefan Droste) in
Dortmund
Collaborative Research Centre “Computational Intelligence” (12 years)
From 1997: Ingo Wegener and members of his Chair (Thomas Jansen, Stefan Droste) in
Dortmund
Collaborative Research Centre “Computational Intelligence” (12 years)
From 1997: Ingo Wegener and members of his Chair (Thomas Jansen, Stefan Droste) in
Dortmund
Collaborative Research Centre “Computational Intelligence” (12 years)
From 1997: Ingo Wegener and members of his Chair (Thomas Jansen, Stefan Droste) in
Dortmund
Collaborative Research Centre “Computational Intelligence” (12 years)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Expectations
P
E(X) = x Prob(X = x) · x
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Expectations
P
E(X) = x Prob(X = x) · x
P∞
If X only assumes values in N, E(X) = x=1 Prob(X ≥ x)
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Expectations
P
E(X) = x Prob(X = x) · x
P∞
If X only assumes values in N, E(X) = x=1 Prob(X ≥ x)
Linearity of expectation: E(X + Y ) = E(X) + E(Y )
Treasure trove for foundations and methods: the book chapter by Doerr [2020].
Probabilities
Prob(A) is the probability of event A, Prob(A) = 1 − Prob(A)
Prob(A ∪ B) = Prob(A) + Prob(B) − Prob(A ∩ B) ≤ Prob(A) + Prob(B)
If A and B are independent, Prob(A ∩ B) = Prob(A) · Prob(B)
Prob(A∩B)
Conditional probabilities: Prob(A | B) = Prob(B)
Expectations
P
E(X) = x Prob(X = x) · x
P∞
If X only assumes values in N, E(X) = x=1 Prob(X ≥ x)
Linearity of expectation: E(X + Y ) = E(X) + E(Y )
Law of total probability: E(X) = E(X | A) · Prob(A) + E X | A · Prob(A)
Inequalities
For all x ∈ R,
1 + x ≤ ex
Inequalities
For all x ∈ R,
1 + x ≤ ex
For all n ∈ N, n n−1
1 1 1
1− ≤ ≤ 1−
n e n
Inequalities
For all x ∈ R,
1 + x ≤ ex
For all n ∈ N, n n−1
1 1 1
1− ≤ ≤ 1−
n e n
Pn 1
Harmonic numbers H(n) := i=1 i :
Task
Find a hidden target string.
target ? ? ? ? ? ? ? ?
solution 1 1 1 1 0 0 1 0
Task Task
Find a hidden target string. Find the all-ones string.
target ? ? ? ? ? ? ? ? target 1 1 1 1 1 1 1 1
solution 1 1 1 1 0 0 1 0 solution 1 0 1 0 1 0 0 1
Theorem
The expected running time of RLS on OneMax is at most n ln(n) + n.
Properties:
reflects basic principle of mutation and selection
stochastic hill climber
flips one bit in expectation
can mimic one step of RLS
can escape from local optima by flipping many bits
fitness
A4
A3
A2
A1
fitness
A4
A3
A2
A1
fitness
A4
A3
A2
A1
Pn
OneMax (x) := i=1 xi
Pn
OneMax (x) := i=1 xi
Pn
OneMax (x) := i=1 xi
Pn Qi
LO (x) := i=1 j=1 xj counts the number of leading ones (11101100).
Pn Qi
LO (x) := i=1 j=1 xj counts the number of leading ones (11101100).
optimum 1n
40
n − k 1-bits
Fitness 30
20
10
0
0 10 20 30
number of 1-bits
optimum 1n
40
n − k 1-bits
Fitness 30
20
10
0
0 10 20 30
number of 1-bits
Take s0 , . . . , sn−1 as for OneMax and sn = (1/n)k (1 − 1/n)n−k ≥ 1/(enk ).
optimum 1n
40
n − k 1-bits
Fitness 30
20
10
0
0 10 20 30
number of 1-bits
Take s0 , . . . , sn−1 as for OneMax and sn = (1/n)k (1 − 1/n)n−k ≥ 1/(enk ).
Theorem
(1+1) EA optimises every function in expected time at most nn .
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Application: a general upper bound 25 / 28
(1+1) EA Always Finds an Optimum
Theorem
(1+1) EA optimises every function in expected time at most nn .
Fitness-level partition:
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Application: a general upper bound 25 / 28
(1+1) EA Always Finds an Optimum
Theorem
(1+1) EA optimises every function in expected time at most nn .
Fitness-level partition:
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Application: a general upper bound 25 / 28
(1+1) EA Always Finds an Optimum
Theorem
(1+1) EA optimises every function in expected time at most nn .
Fitness-level partition:
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Application: a general upper bound 25 / 28
Outlook
Methods for the analysis of RHSs
Fitness-level method (and extensions)
Drift analysis
Tail bounds, typical runs
Random walks
Design aspects
How useful are populations?
How to ensure diversity within the population?
How important is recombination?
Parallel variants of evolutionary algorithms
Parameter control: how to learn good parameters
Design aspects
How useful are populations?
How to ensure diversity within the population?
How important is recombination?
Parallel variants of evolutionary algorithms
Parameter control: how to learn good parameters
Focus will be on
Single-objective, discrete, fixed-size problems
Runtime analysis (other theories are available)
Illustrative, easy to describe problems that we can understand
Theory of Evolutionary Computation (Dirk Sudholt) Foundations Outlook and conclusions 26 / 28
Conclusions
No Free Lunch theorems state that any two search heuristics have the same
performance
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
Runtime analysis
Seen a first runtime analysis: RLS cracks codes (optimises OneMax) on n bits in
expected time O(n log n).
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
Runtime analysis
Seen a first runtime analysis: RLS cracks codes (optimises OneMax) on n bits in
expected time O(n log n).
The fitness-level method is a simple method for obtaining upper bounds for the
(1+1) EA.
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
Runtime analysis
Seen a first runtime analysis: RLS cracks codes (optimises OneMax) on n bits in
expected time O(n log n).
The fitness-level method is a simple method for obtaining upper bounds for the
(1+1) EA.
Runtime bounds for the (1+1) EA on OneMax, LeadingOnes and Jump
No Free Lunch theorems state that any two search heuristics have the same
performance
▶ But the No Free Lunch scenario is neither realistic nor interesting.
▶ Need to consider specific problem classes for meaningful results.
Early approaches (like Schema theory) were lacking rigour and led to false claims.
Reviewed foundations and tools from probability theory
Runtime analysis
Seen a first runtime analysis: RLS cracks codes (optimises OneMax) on n bits in
expected time O(n log n).
The fitness-level method is a simple method for obtaining upper bounds for the
(1+1) EA.
Runtime bounds for the (1+1) EA on OneMax, LeadingOnes and Jump
The runtime of EAs with standard bit mutation is bounded by nn .