A Tutorial On Differential Evolution With Python - Pablo R. Mier
A Tutorial On Differential Evolution With Python - Pablo R. Mier
I have to admit that I’m a great fan of the Differential Evolution (DE) algorithm. This
algorithm, invented by R. Storn and K. Price
(https://link.springer.com/article/10.1023%2FA%3A1008202821328?LI=true) in 1997, is a very powerful
algorithm for black-box optimization (also called derivative-free optimization). Black-box
optimization is about finding the minimum of a function f (x) : ℝn → ℝ, where we don’t
know its analytical form, and therefore no derivatives can be computed to minimize it (or are
hard to approximate). The figure below shows how the DE algorithm approximates the
minimum of a function in succesive steps:
The optimization of black-box functions is very common in real world problems, where the
function to be optimized is very complex (and may involve the use of simulators or external
software for the computations). For these kind of problems, DE works pretty well, and that’s
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 1 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
why it’s very popular for solving problems in many different fields, including Astronomy,
Chemistry, Biology, and many more. For example, the European Space Agency (ESA) uses
DE to design optimal trajectories (http://www.esa.int/gsp/ACT/doc/INF/pub/ACT-RPR-INF-2014-
(PPSN)CstrsOptJupiterCapture.pdf) in order to reach the orbit of a planet using as less fuel as
possible. Sounds awesome right? Best of all, the algorithm is very simple to understand and
to implement. In this tutorial, we will see how to implement it, how to use it to solve some
problems and we will build intuition about how DE works.
Let’s start!
Before getting into more technical details, let’s get our hands dirty. One thing that
fascinates me about DE is not only its power but its simplicity, since it can be implemented
in just a few lines. Here is the code for the DE algorithm using the rand/1/bin schema (we
will talk about what this means later). It only took me 27 lines of code using Python with
Numpy:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 2 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
1 import numpy as np
2
3 def de(fobj, bounds, mut=0.8, crossp=0.7, popsize=20, its=1000):
4 dimensions = len(bounds)
5 pop = np.random.rand(popsize, dimensions)
6 min_b, max_b = np.asarray(bounds).T
7 diff = np.fabs(min_b - max_b)
8 pop_denorm = min_b + pop * diff
9 fitness = np.asarray([fobj(ind) for ind in pop_denorm])
10 best_idx = np.argmin(fitness)
11 best = pop_denorm[best_idx]
12 for i in range(its):
13 for j in range(popsize):
14 idxs = [idx for idx in range(popsize) if idx != j]
15 a, b, c = pop[np.random.choice(idxs, 3, replace = False)]
16 mutant = np.clip(a + mut * (b - c), 0, 1)
17 cross_points = np.random.rand(dimensions) < crossp
18 if not np.any(cross_points):
19 cross_points[np.random.randint(0, dimensions)] = True
20 trial = np.where(cross_points, mutant, pop[j])
21 trial_denorm = min_b + trial * diff
22 f = fobj(trial_denorm)
23 if f < fitness[j]:
24 fitness[j] = f
25 pop[j] = trial
26 if f < fitness[best_idx]:
27 best_idx = j
28 best = trial_denorm
29 yield best, fitness[best_idx]
github.com/pablormier/0caff10a5f76e87857b44f63757729b0/raw/793550984b8853cbc942e0631f2dab4bdc3fc88d/differential_evolution.py)
differential_evolution.py (https://gist.github.com/pablormier/0caff10a5f76e87857b44f63757729b0#file-differential_evolution-
py) hosted with ❤ by GitHub (https://github.com)
This code is completely functional, you can paste it into a python terminal and start playing
with it (you need numpy >= 1.7.0). Don’t worry if you don’t understand anything, we will see
later what is the meaning of each line in this code. The good thing is that we can start
playing with this right now without knowing how this works. The only two mandatory
parameters that we need to provide are fobj and bounds:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 3 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
fobj: f (x) function to optimize. Can be a function defined with a def or a lambda
n
expression. For example, suppose we want to minimize the function f (x) = ∑i xi2 /n.
If x is a numpy array, our fobj can be defined as:
def fobj(x):
value = 0
for i in range(len(x)):
value += x[i]**2
return value / len(x)
bounds: a list with the lower and upper bound for each parameter of the function. For
example: bound sx = [(-5, 5), (-5, 5), (-5, 5), (-5, 5)] means that each variable
xi , i ∈ [1, 4] is bound to the interval [-5, 5].
For example, let’s find the value of x that minimizes the function f (x) = x 2 , looking for
values of x between -100 and 100:
The first value returned ( array([ 0.] ) represents the best value for x (in this case is just
a single number since the function is 1-D), and the value of f(x) for that x is returned in
the second array ( array([ 0.] ).
Note: for convenience, I defined the de function as a generator function that yields
the best solution x and its corresponding value of f (x) at each iteration. In order to
obtain the last solution, we only need to consume the iterator, or convert it to a list
and obtain the last value with list(de(...))[-1]
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 4 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Yeah I know, this is too easy. Now, let’s try the same example in a multi-dimensional setting,
n
with the function now defined as f (x) = ∑i xi2 /n, for n=32 dimensions. This is how it looks
like in 2D:
# https://github.com/pablormier/yabox (https://github.com/pablormier/yabox)
# pip install yabox
>>> from yabox.problems import problem
>>> problem(lambda x: sum(x**2)/len(x), bounds=[(-5, 5)] * 2).plot3d()
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 5 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
This time the best value for f(x) was 6.346 , we didn’t obtained the optimal solution
f (0, … , 0) = 0. Why? DE doesn’t guarantee to obtain the global minimum of a function.
What it does is to approach the global minimum in successive steps, as shown in Fig. 1. So
in general, the more complex the function, the more iterations are needed. This can raise a
new question: how does the dimensionality of a function affects the convergence of the
algorithm? In general terms, the difficulty of finding the optimal solution increases
exponentially with the number of dimensions (parameters). This effect is called “curse of
dimensionality (https://en.wikipedia.org/wiki/Curse_of_dimensionality)”. For example, suppose we
want to find the minimum of a 2D function whose input values are binary. Since they are
binary and there are only two possible values for each one, we would need to evaluate in the
worst case 22 = 4 combinations of values: f (0, 0) , f (0, 1) , f (1, 0) and f (1, 1) . But if we
have 32 parameters, we would need to evaluate the function for a total of 232 =
4,294,967,296 possible combinations in the worst case (the size of the search space grows
exponentially). This makes the problem much much more difficult, and any metaheuristic
algorithm like DE would need many more iterations to find a good approximation.
Knowing this, let’s run again the algorithm but for 3,000 iterations instead of just 1,000:
Now we obtained a much better solution, with a value very close to 0. In this case we only
needed a few thousand iterations to obtain a good approximation, but with complex
functions we would need much more iterations, and yet the algorithm could get trapped in a
local minimum. We can plot the convergence of the algorithm very easily (now is when the
implementation using a generator function comes in handy):
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 6 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Fig. 2 shows how the best solution found by the algorithm approximates more and more to
the global minimum as more iterations are executed. Now we can represent in a single plot
how the complexity of the function affects the number of iterations needed to obtain a good
approximation:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 7 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
The plot makes it clear that when the number of dimensions grows, the number of iterations
required by the algorithm to find a good solution grows as well.
How it works?
Now it’s time to talk about how these 27 lines of code work. Differential Evolution, as the
name suggest, is a type of evolutionary algorithm. An evolutionary algorithm is an algorithm
that uses mechanisms inspired by the theory of evolution, where the fittest individuals of a
population (the ones that have the traits that allow them to survive longer) are the ones that
produce more offspring, which in turn inherit the good traits of the parents. This makes the
new generation more likely to survive in the future as well, and so the population improves
over time, generation after generation. This is possible thanks to different mechanisms
present in nature, such as mutation, recombination and selection, among others.
Evolutionary algorithms apply some of these principles to evolve a solution to a problem.
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 8 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Components
fobj:
bounds:
Initialization
The first step in every evolutionary algorithm is the creation of a population with popsize
individuals. An individual is just an instantiation of the parameters of the function fobj. At
the beginning, the algorithm initializes the individuals by generating random values for each
parameter within the given bounds. For convenience, I generate uniform random numbers
between 0 and 1, and then I scale the parameters (denormalization) to obtain the
corresponding values. This is done in lines 4-8 of the algorithm.
This generates our initial population of 10 random vectors. Each component x[i] is
normalized between [0, 1]. We will use the bounds to denormalize each component only for
evaluating them with fobj.
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 9 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Evaluation
The next step is to apply a linear transformation to convert each component from [0, 1] to
[min, max]. This is only required to evaluate each vector with the function fobj:
At this point we have our initial population of 10 vectors, and now we can evaluate them
using our fobj. Although these vectors are random points of the function space, some of
them are better than others (have a lower f (x) ). Let’s evaluate them:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 10 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
After evaluating these random vectors, we can see that the vector x=[ 3., -0.68, -4.43,
-0.57] is the best of the population, with a f (x) = 7.34, so these values should be closer to
the ones that we’re looking for. The evaluation of this initial population is done in L. 9 and
stored in the variable fitness.
[1, 2, 3, 4, 5, 6, 7, 8, 9]
array([1, 4, 7])
>>> pop[selected]
>>> a, b, c = pop[selected]
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 11 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Now, we create a mutant vector by combining a, b and c. How? by computing the difference
(now you know why it’s called differential evolution) between b and c and adding those
differences to a after multiplying them by a constant called mutation factor (parameter
mut). A larger mutation factor increases the search radius but may slowdown the
convergence of the algorithm. Values for mut are usually chosen from the interval [0.5, 2.0].
For this example, we will use the default value of mut = 0.8:
# mut = 0.8
>>> mutant = a + mut * (b - c)
Note that after this operation, we can end up with a vector that is not normalized (the
second value is greater than 1 and the third one is smaller than 0). The next step is to fix
those situations. There are two common methods: by generating a new random value in the
interval [0, 1], or by clipping the number to the interval, so values greater than 1 become 1,
and the values smaller than 0 become 0. I chose the second option just because it can be
done in one line of code using numpy.clip :
>>> np.clip(mutant, 0, 1)
Now that we have our mutant vector, the next step to perform is called recombination.
Recombination is about mixing the information of the mutant with the information of the
current vector to create a trial vector. This is done by changing the numbers at some
positions in the current vector with the ones in the mutant vector. For each position, we
decide (with some probability defined by crossp) if that number will be replaced or not by
the one in the mutant at the same position. To generate the crossover points, we just need
to generate uniform random values between [0, 1] and check if the values are less than
crossp. This method is called binomial crossover since the number of selected locations
follows a binomial distribution.
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 12 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
In this case we obtained two Trues at positions 1 and 3, which means that the values at
positions 1 and 3 of the current vector will be taken from the mutant. This can be done in
one line again using the numpy function where :
Replacement
After generating our new trial vector, we need to denormalize it and evaluate it to measure
how good it is. If this mutant is better than the current vector ( pop[0] ) then we replace it
with the new one.
>>> fobj(mutant_denorm)
13.425000000000001
In this case, the trial vector is worse than the target vector (13.425 > 12.398), so the target
vector is preserved and the trial vector discarded. All these steps have to be repeated again
for the remaining individuals (pop[j] for j=1 to j=9), which completes the first iteration of the
algorithm. After this process, some of the original vectors of the population will be replaced
by better ones, and after many iterations, the whole population will eventually converge
towards the solution (it’s a kind of magic uh?).
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 13 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Let’s see now the algorithm in action with another concrete example. Given a set of points
(x, y), the goal of the curve fitting problem is to find the polynomial that better fits the given
points by minimizing for example the sum of the distances between each point and the
curve. For this purpose, we are going to generate our set of observations (x, y) using the
function f (x) = cos(x), and adding a small amount of gaussian noise:
Figure 5. Dataset of 2D points (x, y) generated using the function y = cos(x) with
gaussian noise.
Our goal is to fit a curve (defined by a polynomial) to the set of points that we generated
before. This curve should be close to the original f (x) = cos(x) used to generate the points.
We would need a polynomial with enough degrees to generate at least 4 curves. For this
purpose, a polynomial of degree 5 should be enough (you can try with more/less degrees to
see what happens):
fmodel (w, x) = w0 + w1 x + w2 x 2 + w3 x 3 + w4 x 4 + w5 x 5
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 14 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Using this expression, we can generate an infinite set of possible curves. For example:
Among this infinite set of curves, we want the one that better approximates the original
function f (x) = cos(x). For this purpose, we need a function that measures how good a
polynomial is. We can use for example the Root Mean Square Error (RMSE)
(https://en.wikipedia.org/wiki/Root-mean-square_deviation) function:
def rmse(w):
y_pred = fmodel(x, w)
return np.sqrt(sum((y - y_pred)**2) / len(y))
Now we have a clear description of our problem: we need to find the parameters
w = {w1 , w2 , w3 , w4 , w5 , w6 } for our polynomial of degree 5 that minimizes the rmse
function. Let’s evolve a population of 20 random polynomials for 2,000 iterations with DE:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 15 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
We obtained a solution with a rmse of ~0.215. We can plot this polynomial to see how good
our approximation is:
>>> plt.scatter(x, y)
>>> plt.plot(x, np.cos(x), label='cos(x)')
>>> plt.plot(x, fmodel(x, [0.99677643, 0.47572443, -1.39088333,
>>> 0.50950016, -0.06498931, 0.00273167]), label='result')
>>> plt.legend()
Figure 7. Approximation of the original function f (x) = cos(x) used to generate the data
points, after 2000 iterations with DE.
Not bad at all!. Now let’s see in action how the algorithm evolve the population of random
vectors until all of them converge towards the solution. It is very easy to create an animation
with matplotlib, using a slight modification of our original DE implementation to yield the
entire population after each iteration instead of just the best vector:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 16 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
fig, ax = plt.subplots()
def animate(i):
ax.clear()
ax.set_ylim([-2, 2])
ax.scatter(x, y)
pop, fit, idx = result[i]
for ind in pop:
data = fmodel(x, ind)
ax.plot(x, data, alpha=0.3)
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 17 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
The animation shows how the different vectors in the population (each one corresponding
to a different curve) converge towards the solution after a few iterations.
DE variations
The schema used in this version of the algorithm is called rand/1/bin because the vectors
are randomly chosen (rand), we only used 1 vector difference and the crossover strategy
used to mix the information of the trial and the target vectors was a binomial crossover. But
there are other variants:
Mutation schemas
Rand/1: xmut = xr1 + F(xr2 − xr3 )
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 18 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Crossover schemas
Binomial (bin): crossover due to independent binomial experiments. Each component
of the target vector has a probability p of being changed by the component of the the
mutant vector.
Exponential (exp): it’s a two-point crossover operator, where two locations of the
vector are randomly chosen so that n consecutive numbers of the vector (between the
two locations) are taken from the mutant vector.
Final words
Differential Evolution (DE) is a very simple but powerful algorithm for optimization of
complex functions that works pretty well in those problems where other techniques (such
as Gradient Descent) cannot be used. In this post, we’ve seen how to implement it in just 27
lines of Python with Numpy, and we’ve seen how the algorithm works step by step.
If you are looking for a Python library for black-box optimization that includes the
Differential Evolution algorithm, here are some:
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 19 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html).
The well known scientific library for Python includes a fast implementation of the
Differential Evolution algorithm.
Categories: Tutorials
COMMENTS
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 20 of 21
A tutorial on Differential Evolution with Python - Pablo R. Mier 17/09/20, 7:13 PM
Name
https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/ Page 21 of 21