Multi-Algorithm Optimization
Multi-Algorithm Optimization
8, 2023-12
Multi-Algorithm Optimization
Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]
doi: 10.13140/RG.2.2.21772.49284
Abstract
The No Free Lunch (NFL) Theorem states that the average success rate of all optimization
algorithms is basically the same, considering that certain algorithms work well for some types
of problems, but fail for other types of problems. Another interpretation of the NFL Theorem is
that “there is no universal optimizer”, capable of successfully and efficiently solving any type of
optimization problem. In this report, a Multi-Algorithm Optimization strategy is presented
which allows increasing the average success rate at a reasonable cost, by running a sequence
of different optimization algorithms, starting from multiple random points. Optimization of
different benchmark problems performed with this algorithm illustrated that the particular
sequence employed and the number of starting points greatly influence the success rate and
cost of the optimization. A suggested sequence consisting on using the Broyden-Fletcher-
Goldfarb-Shanno, Nelder-Mead, and adaptive step-size One-at-a-time optimization algorithms
and using random starting points, achieved overall success rate, with average
optimization time for a benchmark set of different global optimization problems. The
proposed method (implemented in R language) is included in the Appendix.
Keywords
1. Introduction
Optimization refers to any procedure used to find the best values of a certain set of decision
variables that optimizes (maximizes or minimizes) a certain objective function (which is a
function of a particular set of decision variables) [1]. The main purpose of any optimization
algorithm is solving any optimization problem in an efficient way, that is, at a reasonable cost
(considering for example, time, computational resources, etc.).
Cite as: Hernandez, H. (2023). Multi-Algorithm Optimization. ForsChem Research Reports, 8, 2023-12, 1 -
33. Publication Date: 28/08/2023.
Multi-Algorithm Optimization
Hugo Hernandez
ForsChem Research
[email protected]
There are currently hundreds of different optimization algorithms available in the scientific
literature [2]. Unfortunately, there is not a single best optimization algorithm. According to the
No Free Lunch (NFL) Theorem [3,4], the average performance of all optimization algorithms is
basically the same, as the improvement in performance in certain types of optimization
problems is compensated by the decrease in performance in other types of problems. In simple
words, the NFL theorem states that ‘‘universal optimizers are impossible.’’ [5]
The NFL Theorem is graphically illustrated in Figure 1, showing the success rate obtained using
different optimization algorithms for different benchmark problems considered in a
previous report [6]. While some methods may show a higher average success rate, for all
methods there are always optimization problems with a very low success rate, as well as
problems with a very high success rate. Also, the differences in average success rates between
methods are usually not significant from a statistical point of view. In addition, higher average
success rates were typically obtained by higher optimization costs.
Figure 1. Success Rate of different optimization algorithms. Black dots: Success rate of
individual benchmark problems (1000 random runs). Green diamond: Sample average success
rate. Dotted red lines: 95% confidence intervals in the estimation of the mean success rate. NM:
Nelder-Mead. BFGS: Broyden-Fletcher-Goldfarb-Shanno. BFGS-UB: BFGS unbounded. SANN:
Simulated Annealing. OAT: Adaptive step-size One-at-a-time. OAT-UB: OAT unbounded.
The nature of the various numerical optimization algorithms available is also different.
Optimization algorithms can be classified, in a very general way, as depicted in Figure 2. First of
all, we can distinguish between methods that calculate the gradient of the function and
methods that do not require the determination of the gradient. While the gradient of a
function may indicate the most efficient route towards improving the objective function, it may
also result in the stagnation of the algorithm at local optima. In that sense, non-gradient based
methods allow overcoming such limitation. Non-gradient based methods can also be classified
into deterministic and random (or stochastic) methods, depending on the search strategy
employed. Deterministic search rules will lead to the exact same result every time the same
starting point is used for the optimization. Search rules based on random numbers, on the
contrary, may yield different results when the same starting point is considered. A particular
category of non-gradient based algorithms, presented in Figure 2, is the randomistic algorithm,
involving both deterministic and random search rules in the same algorithm. The Adaptive step-
size One-at-a-time (OAT) algorithm presented in a previous report is an example of a
randomistic optimization method [6].
Gradient-based methods are quite successful in the case of relatively simple optimization
functions (i.e. convex functions), where a local optimum is also a global optimum. For non-
convex optimization problems, gradient-based methods may fail to find the global optimum.
Random methods have usually a higher rate of success compared to deterministic methods,
particularly for non-convex problems, due to the exploratory nature of the random search
strategy. However, it also implies a higher optimization cost, in terms of both number of
function evaluations and optimization time.
The purpose of the present report is exploring the possibility of increasing the average success
rate of optimization at a reasonable cost, by combining different optimization algorithms in a
multi-algorithm optimization approach.
Section 2 describes the proposed multi-algorithm optimization method. Section 3 explains the
methodology employed to evaluate optimization performance for the multi-algorithm
optimization using different permutations of algorithms. Section 4 summarizes and discusses
the results obtained with the multi-algorithm optimization method. Finally, the Appendix
includes the corresponding functions implemented in R language (https://cran.r-project.org/).
The idea behind the multi-algorithm optimization method is relatively simple. It basically
consists in the execution of multiple optimization procedures in series, where the best result
obtained after each optimization procedure is used as starting point of the next algorithm. This
strategy is depicted in Figure 3.
Particularly in this report, only different optimization algorithms are considered. They can be
considered as representative algorithms of the different categories of algorithms previously
discussed. A brief description of each algorithm is presented next.
where is the value of the -th iteration, is the objective function, is the gradient
and is an approximation to the Hessian.
Compute a search direction . Three different methods are considered: Direct primal
method, conjugate gradient method or dual method.
Perform a line search along , subject to the bounds, using a step length as follows:
(2.2)
Compute ( )
Check for convergence, update parameter values, set and return to the first
step.
The BFGS algorithm is used in R language (https://cran.r-project.org/) using the function optim
from the stats R package (version 3.6.2), by specifying as input argument for method the text
“L-BFGS-B”.
The representative method chosen in this case is the Nelder-Mead (NM) algorithm [8], also
known as downhill simplex method. A general description of the method is the following:
Set the initial position of all vertices of the simplex at a fixed step along each dimension
of the problem.
Evaluate the objective function at the vertices and establish a hierarchy according to
their values, where is the best and is the worst vertex.
Determine the centroid ( ̅ ) of the vertices excluding .
Use the following operations for determining the new vertex:
o Reflection: This is the starting operation. A new vertex ( ) is proposed by:
( ) ̅
(2.3)
where is the reflection coefficient.
o Expansion: If reflection produced a new optimum, then the vertex is expanded
to:
( ) ̅
(2.4)
where is the expansion coefficient.
o Contraction: On the other hand, if the reflected vertex remains as the worst,
then the new vertex is contracted into:
( ) ( ) ̅
(2.5)
Perhaps one of the most representative methods for non-gradient stochastic search methods
is Simulated Annealing (SANN) [9]. SANN has been inspired by the Metropolis-Monte Carlo
simulation method used in molecular modeling [10]. The main steps of this method are:
Set an initial state ( ), initial temperature ( ), and a cooling schedule, where the
temperature of the system is a function of the iteration number.
For each new iteration , select a random candidate ( ) from a given probability
distribution function. Typically, a Markov kernel is considered, where the new
candidate randomly deviates from the current state ( ), as follows:
(2.6)
where is a random deviation term following a probability model (e.g. normal,
uniform, etc.).
Calculate the probability of acceptance for the candidate state as follows:
( ) ( )
( ) ( )
(2.7)
The sign of the exponent is negative for minimization problems, and positive for
maximization problems. is the objective function.
Calculate a uniform random number ( ) to evaluate candidate acceptance:
( )
{
( )
(2.8)
Update the temperature of the system ( ) using the pre-defined cooling schedule.
Repeat the procedure until any termination criterion is achieved.
The SANN algorithm is used in R language (https://cran.r-project.org/) using the function optim
from the stats R package (version 3.6.2), by specifying as input argument for method the text
“SANN”.
The non-gradient randomistic method selected for the present analysis is the Adaptive step-size
One-at-a-time (OAT) introduced in a previous report [6]. The method procedure is the
following:
A minimum ( ) and an initial step size ( ) are defined for each decision variable
(or dimension).
For each optimization cycle, the evaluation order for the decision variables is randomly
determined.
For each dimension, the new value of the decision variable is calculated as:
(2.9)
where (search direction), and is the value at the current best point.
If the new point improves the objective function, the current best point is updated and
the algorithm is accelerated by setting . Otherwise, the new step size is
randomly decreased:
⟦ ⟧
(2.10)
where is a uniform random number, and ⟦ ⟧ is the rounding to the closest integer
operator.
When , the search direction is switched (only once), otherwise the next
dimension is considered.
The whole cycle is repeated until any termination criterion is achieved.
The integration of the four algorithms into a single optimization procedure has been
implemented in R language using the following preliminary maoptim function:
The procedures employs as input argument a vector of integer numbers between and ,
denoting each of the representative methods described earlier, in the same order:
Notice that the length of the vector can be arbitrarily defined, as well as the methods
employed. Thus, any algorithm can be repeatedly used, and any algorithm can be left out of the
evaluation. By default, all algorithms are performed only once in the corresponding arbitrarily
pre-specified order ( ).
The output of this function includes: i) The best parameter values obtained after the
procedure, ii) the best objective value found by minimization, iii) the total number of function
evaluations performed, and iv) the total duration of the optimization procedure.
All the optimization algorithms employed were run using the above-mentioned R functions
always considering their corresponding default parameters. The optimization procedures were
evaluated using an Intel® Core™ i5-2400S processor @ 2.50 GHz with 8 GB RAM.
The performance of each arbitrary array of methods considering the algorithms previously
described, was evaluated using the same metrics reported in [6] for benchmark
optimization problems (described in the Appendix). The metrics considered include:
Average Euclidean distance (〈 〉) from the true optimum ( ). The average Euclidean
distance to the optimum is determined as follows:
∑
〈 〉
(3.1)
where is the Euclidean distance of the best result found in the -th of
replicates, with respect to the true optimum known for the benchmark function, given
by:
√∑( )
(3.2)
In the case of multiple global optima, the Euclidean distance is considered with respect
to the closest optimum.
(3.3)
(3.4)
where
{
(3.5)
(3.6)
where is the number of functions evaluations of each solution, determined and
reported by the optimization algorithm.
(3.7)
where was determined in R for each replicate using the function difftime.
In first place, a full factorial design was employed to test the effect of the particular array of
methods considered in the multi-algorithm optimization. Only optimization steps were
considered (arrays with a length of 4), where each of the algorithms can be used. This leads
to a factorial design with different permutations. All benchmark optimization
problems were evaluated with each permutation, considering random sets of starting
points. Of course, the same starting points were evaluated using each permutation.
Particularly for this design, permutations involved a single algorithm only (i.e. , ,
and ). permutations involved only two different algorithms. permutations
involved three different algorithms. And permutations involved all algorithms at the same
time.
The overall success rate values obtained for the different permutations are illustrated in
Figure 4, grouped by the number of different algorithms considered. A significant trend is
observed in the data, indicating that effectively, the combination of different optimization
strategies leads to an increase in the success rate. Figure 5 shows the corresponding effect of
the number of different algorithms on the average number of total function evaluations and
average optimization times. These plots illustrate that the average optimization cost is not
significantly influenced by increasing the number of different algorithms (considering the same
number of steps). However, an increase in diversity reduces the fluctuations in the results.
Notice that the optimization cost observed in these results is strongly influenced by the SANN
algorithm. Table 1 summarizes the results shown in Figure 4 and Figure 5.
Figure 4. Overall Success Rate as a function of the number of different algorithms employed.
Black dots: Success rate of individual permutations considering benchmark problems (100
random runs). Green diamond: Average success rate. Dotted red lines: confidence
intervals in the estimation of the mean success rate.
Figure 5. Total number of function evaluations (left) and average optimization time (right) as
functions of the number of different algorithms employed. Black dots: Performance of
individual permutations considering benchmark problems ( random runs). Green
diamond: Average success rate. Dotted red lines: confidence intervals in the estimation of
the mean success rate.
Table 1. Estimates of the average and standard deviation of the average of performance criteria
as a function of number of different algorithms
# Different Success Rate # Function Evaluations Optimization time (s)
Algorithms Average Avg. St. Dev. Average Avg. St. Dev. Average Avg. St. Dev.
1 40.5% 5.81% 10411 9865.9 0.1106 0.0877
2 49.6% 0.98% 10407 1229.0 0.1117 0.0111
3 54.8% 0.53% 10412 584.0 0.1131 0.0054
4 59.0% 0.40% 10416 62.4 0.1146 0.0032
Notice that SANN alone has a default maximum number of function evaluations, which
are completed in all optimizations. Thus, while SANN performs function evaluations in
the optimization cycle, all other algorithms combined perform on average about function
evaluations only, representing only about of the total optimization cost.
From these results we may also conclude that algorithm diversity improves success rate,
partially overcoming the limitations described by the NFL theorem, while at the same time
The highest success rate was obtained by the algorithm sequence [OAT,SANN,BFGS,NM], while
the lowest cost and highest 〈 〉 ratio was obtained by the [SANN,BFGS,OAT,NM] sequence.
Only this pair of sequences will be considered for the next analysis.
Following with the idea of increasing success rate, the algorithm sequences showing best
performance ([OAT,SANN,BFGS,NM] and [SANN,BFGS,OAT,NM]) were evaluated considering
multiple consecutive optimization cycles. That is, the whole algorithm sequence was repeated
using the best point obtained by each cycle as the starting point for the next cycle. The
evaluation was performed for up to consecutive cycles for each sequence. All benchmark
optimization problems were considered again, but this time using random sets of starting
points. The results obtained are summarized in Table 3 and Figure 6.
Table 3. Performance of different cycles for the best algorithm sequences employing
different optimization algorithms.
Number of Average Success Average Optimization 〈 〉
Algorithm Sequence
Cycles ( ) Rate ( ) Time 〈 〉 (s) Performance Ratio
[OAT,SANN,BFGS,NM] 1 61.7% 0.1391 4.4370
[OAT,SANN,BFGS,NM] 2 67.3% 0.2347 2.8670
[OAT,SANN,BFGS,NM] 3 69.8% 0.3316 2.1052
[OAT,SANN,BFGS,NM] 4 69.9% 0.4285 1.6314
[OAT,SANN,BFGS,NM] 5 71.1% 0.5246 1.3549
[SANN,BFGS,OAT,NM] 1 59.7% 0.0997 5.9838
[SANN,BFGS,OAT,NM] 2 66.5% 0.1980 3.3593
[SANN,BFGS,OAT,NM] 3 70.0% 0.2953 2.3697
[SANN,BFGS,OAT,NM] 4 71.1% 0.3929 1.8099
[SANN,BFGS,OAT,NM] 5 72.4% 0.4898 1.4790
First of all, we may notice that the [OAT,SANN,BFGS,NM] sequence only shows a higher
success rate compared to [SANN,BFGS,OAT,NM] for less than cycles, while
[SANN,BFGS,OAT,NM] always shows less optimization times compared to
[OAT,SANN,BFGS,NM] independently of the number of cycles.
Figure 6. Overall success rate (left) and average optimization time (right) as functions of the
number of cycles of optimization sequences. Light blue diamonds: [OAT,SANN,BFGS,NM]. Dark
blue circles: [SANN,BFGS,OAT,NM].
In second place, the optimization time increases linearly with the number of cycles, as
expected. On the other hand, while the success rate increases with the number of cycles, such
increase is less than expected. If we assume that the performance of each cycle is independent
from the previous cycles, we would expect that:
( ) ( ( ))
(4.1)
where ( ) is the expected success rate after cycles, and ( ) is the observed success rate
for a single cycle.
The observed success rates are compared to the expected success rates in Figure 7.
Figure 7. Comparison between overall observed (blue) success rate and expected (green)
success rate according to Eq. (4.1) as functions of the number of cycles of optimization
sequences. Diamonds: [OAT,SANN,BFGS,NM]. Circles: [SANN,BFGS,OAT,NM].
The difference in the observed results can be explained by two reasons: 1) The probability of
success is different for each optimization problem. 2) The probability of success is different for
each starting point.
Regarding the second cause, while the average probability of success in a sample of random
points is ( ) , the probability of success of certain points is much lower than the average
( ( ) ), resulting in a lower impact of the number of cycles on success. In simpler words, bad
starting points will not be successful no matter how many optimization cycles are performed.
For that reason, a new strategy will be employed in the following section to overcome this
particular difficulty.
Considering the previous results, we may conclude that choosing different random starting
points at the beginning of each cycle might be more successful that continuing optimizing from
the current best point. Such randomization of the starting point for a better exploration of the
search region is the goal of non-gradient random search methods.
On the other hand, while we have included the SANN algorithm in the multi-algorithm
optimization method, it seems like something is not working properly with SANN. Among the
algorithms considered here, the SANN algorithm provided the highest success rate ( )
when a single algorithm was used. However, it also represents almost of the optimization
cost when all algorithms are used. Let us recall that SANN evaluates new random points but it
compares the performance of the new point with the current best value. Thus, SANN evaluates
whether the new point is suitable as an optimum value, but it does not evaluate whether the
new point is a good starting point for the optimization using other methods. For this reason,
the SANN algorithm will be removed from the algorithm sequence, and instead it will be
replaced by the evaluation of the sequence for different initial random points. This is equivalent
to performing various optimization cycles simultaneously, instead of performing them in series
(where the current best is used as starting point for the next cycle).
Also notice the great reduction in optimization time obtained by eliminating SANN from the list
of algorithms ( w/o SANN vs. with SANN). On average, sequences including
SANN are about times slower than sequences without SANN.
Notice also that the best performance ( success rate) was achieved with the sequence
[BFGS,OAT,NM] without any redundancy.
From this results, the first two sequences ([BFGS,NM,OAT] and [BFGS,OAT,NM]) are selected
to be used for the next evaluation. In this case, the number of starting points used in the
optimization is changed between and . In addition, the evaluation with starting points
was also included. All the starting points are randomly chosen considering a uniform
distribution within the boundaries of each decision variable. Each problem is solved times
for each number of starting points. Only the first starting point is identical for all
optimization runs. The results obtained are summarized in Table 5 and Figure 8.
Table 5. Effect of the number of random starting points on the performance of sequences
[BFGS,NM,OAT] and [BFGS,OAT,NM]
[BFGS,NM,OAT] [BFGS,OAT,NM]
#Starting
Avg. Success Avg. Optim. Performance Avg. Success Avg. Optim. Performance
Points
Rate Time (s) Ratio Rate Time (s) Ratio
1 47.3% 0.0118 40.21 49.8% 0.0135 36.93
2 60.6% 0.0234 25.83 64.8% 0.0246 26.35
3 68.2% 0.0334 20.41 72.5% 0.0381 19.03
4 72.6% 0.0433 16.77 77.7% 0.0497 15.62
5 74.9% 0.0516 14.50 80.4% 0.0631 12.75
6 77.1% 0.0593 12.99 82.7% 0.0750 11.02
7 79.0% 0.0686 11.51 83.9% 0.0867 9.68
8 79.9% 0.0798 10.01 85.5% 0.0988 8.65
9 81.3% 0.0892 9.12 85.9% 0.1105 7.78
10 81.5% 0.0993 8.20 87.4% 0.1242 7.04
11 82.2% 0.1109 7.41 88.1% 0.1332 6.61
12 83.0% 0.1157 7.17 88.7% 0.1436 6.17
13 83.3% 0.1275 6.53 88.7% 0.1606 5.52
14 84.0% 0.1391 6.04 90.1% 0.1703 5.29
15 84.2% 0.1495 5.63 90.3% 0.1833 4.93
Figure 8. Overall success rate (left) and average optimization time (right) as functions of the
number of random starting points. Light blue diamonds: [BFGS,NM,OAT]. Dark blue circles:
[BFGS,OAT,NM]. Solid green line: Eq. (4.1).
The fitted (dashed) curves shown in Figure 8 correspond to the following empirical models. In
the case of average success rates:
( ) ( ( ( ( )) ))
(4.2)
( ) ( ( ( ( )) ))
(4.3)
where represents the number of initial starting points, is the error function, and is
the decimal logarithm function.
In general, the success rate is higher for the sequence [BFGS,OAT,NM] than for
[BFGS,NM,OAT] considering any arbitrary number of starting points. However, they were both
lower than the theoretical success rate determined by Eq. (4.1) (using an intermediate value of
as the success ratio for a single starting point).
In the case of average optimization times, the following linear models were obtained:
〈 〉( )
(4.4)
〈 〉( )
(4.5)
Thus, the optimization time for the sequence [BFGS,OAT,NM] is longer than for the
sequence [BFGS,NM,OAT]. In addition, using about random starting points results in
optimization times similar to those obtained by including SANN in the optimization sequence,
and also results in higher success rates.
The difference between the theoretical and the observed success rates can be attributed to the
different success rates of each problem. Table 6 shows the individual success rates obtained by
each sequence for each particular problem using and starting points.
Table 6. Individual success rates for the different optimization problems obtained with
sequences [BFGS,OAT,NM] and [BFGS,NM,OAT] for and starting points. Green:
success rate. Red: Success rate .
[BFGS,OAT,NM] [BFGS,NM,OAT]
Problem Function
ackley 56.0% 100% 52.5% 100%
beale 87.0% 100% 84.0% 100%
booth 100% 100% 100% 100%
bukin6 4.0% 20.5% 1.5% 98.5%
camel 81.0% 100% 82.0% 100%
crossintray 57.5% 100% 66.5% 100%
easom 18.5% 100% 13.0% 100%
eggholder 1.5% 81.5% 0.5% 75.5%
goldsteinprice 55.0% 100% 68.5% 100%
gomezlevyC 46.5% 100% 50.0% 100%
himmelblau 21.0% 48.5% 22.0% 100%
hoeldertable 22.5% 100% 50.0% 100%
levi13 53.0% 100% 45.0% 100%
matyas 100% 100% 100% 100%
mccormick 62.5% 100% 56.0% 100%
mishraC 50.5% 100% 54.0% 100%
rastrigin 33.0% 100% 28.5% 100%
rosenbrock 100% 100% 100% 100%
rosenbrockC 2.0% 53.0% 4.5% 99.0%
schaffer2 26.5% 100% 32.0% 100%
schaffer4 15.0% 100% 21.5% 100%
simionescuC 51.5% 100% 68.5% 100%
sphere 100% 100% 100% 100%
styblinskitang 36.5% 100% 37.5% 100%
townsendC 1.5% 95.5% 8.0% 100%
Overall 47.30% 91.96% 49.84% 98.92%
First of all, notice that some “easy” problems already achieved success with a single
starting point. In addition, most optimization problems reached a success rate when
starting points are considered, with the exceptions of the most “difficult” problems. Those
“difficult” problems also resulted in low success rates for a single starting point, commonly
below . For example, a “difficult” problem like the eggholder function showing a success
rate of about for a single starting point, would result in a theoretical individual success rate
of with independent starting points, thus decreasing the overall average success
rate for the set of benchmark problems considered with respect to the theoretical average.
Figure 9 illustrates the evolution of the success rate of individual problems as a function of the
number of starting points for the sequence [BFGS,NM,OAT]. In this graph we can clearly
distinguish a breach between “easy” and “difficult” problems, where “easy” problems are
found on the top of the graph, and “difficult” problems are present in the bottom. Even when
“difficult” problems improve more slowly than “easy” problems, it is clear that by increasing
the number of starting points, their individual success rate improves.
Figure 9. Success Rate of individual problems as a function of the number of starting points
( ) using the [BFGS,NM,OAT] sequence. Black dots: Success rate of individual benchmark
problems (200 random runs). Green diamond: Sample average success rate. Dotted red lines:
95% confidence intervals in the estimation of the mean success rate.
5. Conclusion
From the results obtained we may conclude that a “universal optimizer”, with an almost
success rate for any type of optimization problem is possible by considering a large number of
starting points randomly distributed in the search region. Of course, the cost of such strategy
may be prohibitive in most practical situations. The efficiency of such “universal optimizer” can
be increased (optimization costs decreased) by using a diversification strategy where multiple
optimization algorithms (having different natures) are used simultaneously (either in sequence
or in parallel). This strategy may overcome some limitations described by the No Free Lunch
theorem [3-5].
already , with an average optimization time per run of about for the benchmark
set of problems considered.
This report provides data, information and conclusions obtained by the author(s) as a result of original
scientific research, based on the best scientific knowledge available to the author(s). The main purpose
of this publication is the open sharing of scientific knowledge. Any mistake, omission, error or inaccuracy
published, if any, is completely unintentional.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-
for-profit sectors.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes were made. This can be done in any reasonable manner, but not in any way that
suggests endorsement by the licensor.
NonCommercial: This material may not be used for commercial purposes.
References
[7] Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained
optimization. SIAM Journal on Scientific Computing, 16 (5), 1190-1208. doi: 10.1137/0916069.
[8] Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer
Journal, 7 (4), 308-313. doi: 10.1093/comjnl/7.4.308.
[9] Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on .
Journal of Applied Probability, 29 (4), 885-895. doi: 10.2307/3214721.
[10] Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical
Association, 44 (247), 335-341. doi: 10.1080/01621459.1949.10483310.
Appendix
xmax=upper[i]
if(is.infinite(lower[i])==TRUE){
xmin=0
}
if(is.infinite(upper[i])==TRUE){
xmax=1
}
x0[i]=xmin+(xmax-xmin)*round(1000*runif(1))/1000
}
#Initial step size
if (is.na(step0[i]) | is.nan(step0[i]) | is.null(step0[i]) | is.infinite(step0[i])==
TRUE){
step0[i]=0
} else {
step0[i]=abs(step0[i])
}
if (step0[i]==0){
if(is.infinite(lower[i])==TRUE | is.infinite(upper[i])==TRUE){
if(x0[i]==0){
step0[i]=1
} else {
step0[i]=x0[i]/10
}
} else {
step0[i]=(upper[i]-lower[i])/10
}
}
#Minimum step size
if (is.na(stepmin[i]) | is.nan(stepmin[i]) | is.null(stepmin[i]) |
is.infinite(stepmin[i])==TRUE){
stepmin[i]=0
} else {
stepmin[i]=abs(stepmin[i])
}
if (stepmin[i]==0){
stepmin[i]=step0[i]/1000
}
#Minimum step size correction
step0[i]=stepmin[i]*ceiling(step0[i]/stepmin[i])
x0[i]=stepmin[i]*round(x0[i]/stepmin[i])
}
#Tolerance
if(is.na(tol)==TRUE | is.nan(tol)==TRUE | is.null(tol)==TRUE | is.infinite(tol)==TRUE){
tol=1e-6
} else {
tol=abs(tol)
}
#Number of cycles
if(is.na(ncycles)==TRUE | is.nan(ncycles)==TRUE | is.null(ncycles)==TRUE |
is.infinite(ncycles)==TRUE){
ncycles=100
} else {
ncycles=max(1,abs(round(ncycles)))
}
#Monte Carlo check
if(is.na(MCcheck)==TRUE | is.nan(MCcheck)==TRUE | is.null(MCcheck)==TRUE |
is.infinite(MCcheck)==TRUE){
MCcheck=10
} else {
MCcheck=abs(round(MCcheck))
}
#Optimization mode
optmode=substr(optmode[1],1,3)
if (optmode=="Max" | optmode="MAX"){
optmode="max"
}
#Current best
xopt=x0
if (display==TRUE){
print('Initial point: ')
print(x0)
}
Fobj=tol*round(fun(x0)/tol)
nfeval=1
if (display==TRUE) print(paste('Initial objective function: ',Fobj))
for (i in 1:ncycles){
if (display==TRUE) print(paste('Cycle ',i,'/',ncycles))
xopt0=xopt
cycleorder=sample(1:nd)
for (k in 1:nd){
j=cycleorder[k]
if (display==TRUE) print(paste('Variable ',j,' (',k,'/',nd,')'))
exit=0
dir=1
step=step0[j]
paramv=xopt[j]
paramvopt=paramv
while (dir>=-1){
while (exit==0){
paramv=stepmin[j]*round((paramvopt+dir*step)/stepmin[j])
paramv=max(paramv,lower[j])
paramv=min(paramv,upper[j])
x=xopt
x[j]=paramv
if (optmode=='max'){
Fobjnew=-Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
if(is.na(Fobjnew)==TRUE | is.nan(Fobjnew)==TRUE | is.null(Fobjnew)==TRUE)
Fobjnew=-Inf
if (Fobjnew>(Fobj+tol)){
paramvopt=paramv
Fobj=tol*round(Fobjnew/tol)
if (display==TRUE) print(paste('New best function value: ',Fobj))
step=step*2
} else {
if (step<=stepmin[j]){
exit=1
} else {
step=stepmin[j]*round(step/((2+8*runif(1))*stepmin[j]))
}
}
} else {
Fobjnew=Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
if(is.na(Fobjnew)==TRUE | is.nan(Fobjnew)==TRUE | is.null(Fobjnew)==TRUE)
Fobjnew=Inf
if (Fobjnew<(Fobj-tol)){
paramvopt=paramv
Fobj=tol*round(Fobjnew/tol)
if (display==TRUE) print(paste('New best function value: ',Fobj))
step=step*2
} else {
if (step<=stepmin[j] | abs(Fobj)<tol){
exit=1
} else {
step=stepmin[j]*round(step/((2+8*runif(1))*stepmin[j]))
}
}
}
}
if (dir==1){
exit=0
dir=-1
step=step0[j]
} else {
dir=-2
step=step0[j]
}
}
xopt[j]=paramvopt
if (display==TRUE) print(paste('Best variable value =',xopt[j]))
}
if (max(abs(xopt-xopt0))==0){
if (MCcheck>0){
if (display==TRUE) print('Initiating Monte Carlo check of the optimum.')
for (m in 1:MCcheck){
x=stepmin*round((xopt+GMstep*rnorm(nd))/stepmin)
x=pmax(x,lower)
x=pmin(x,upper)
if (optmode=='max'){
Fobjnew=-Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
if(is.na(Fobjnew)==TRUE | is.nan(Fobjnew)==TRUE | is.null(Fobjnew)==TRUE)
Fobjnew=-Inf
if (Fobjnew>(Fobj+tol)){
xopt=x
Fobj=tol*round(Fobjnew/tol)
if (display==TRUE) print(paste('New best function value: ',Fobj))
}
} else {
Fobjnew=Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
if(is.na(Fobjnew)==TRUE | is.nan(Fobjnew)==TRUE | is.null(Fobjnew)==TRUE)
Fobjnew=Inf
if (Fobjnew<(Fobj-tol)){
xopt=x
Fobj=tol*round(Fobjnew/tol)
if (display==TRUE) print(paste('New best function value: ',Fobj))
}
}
}
}
if (max(abs(xopt-xopt0))==0){
if (display==TRUE) print('No further improvement in the objective function.
Terminating the optimizer.')
break
} else {
if (display==TRUE) print('Resuming the OAT optimizer.')
}
}
}
if (i==ncycles & display==TRUE) print('Maxinum number of cycles reached. Terminating the
optimizer.')
ctime=difftime(Sys.time(),t0,units="secs")
if (display==TRUE){
print('Best point found:')
print(xopt)
print(paste('Best objective function: ',Fobj))
print(paste('Number of function evaluations: ',nfeval))
print(paste('Optimization time (s): ',ctime))
}
return(list(xopt,Fobj,nfeval,ctime))
}
methodv=method
N=length(methodv)
n=length(par)
counts=0
par0=par
valueopt=Inf
for (j in 1:nsp){
if (j>1){
par=lower+(upper-lower)*runif(n)
if(max(is.nan(par))==1) par=paropt+pmax(abs(paropt),abs(par0))*rnorm(n)
}
for (i in 1:N){
if ((min(lower)==-Inf) | (max(upper)==Inf)){
if (methodv[i]=="L-BFGS-B") methodv[i]="BFGS"
} else {
if (n==1 & methodv[i]=="Nelder-Mead") methodv[i]="Brent"
}
if (methodv[i]=="OAT"){
if (is.numeric(control$step0)==TRUE){
step0=control$step0
} else {
step0=(upper-lower)/5
}
if (is.numeric(control$stepmin)==TRUE){
stepmin=control$stepmin
} else {
stepmin=1e-6
}
if (is.numeric(control$ncycles)==TRUE){
ncycles=control$ncycles
} else {
ncycles=1000
}
if (is.numeric(control$tol)==TRUE){
tol=control$tol
} else {
tol=1e-6
}
if (is.numeric(control$MCcheck)==TRUE){
MCcheck=control$MCcheck
} else {
MCcheck=30
}
if (is.logical(control$display)==TRUE){
display=control$display
} else {
display=FALSE
}
if (s==-1) {
optmode="max"
} else {
optmode="min"
}
OUT=OAToptim(fun=fn,x0=par,lower=lower,upper=upper,step0=step0,stepmin=stepmin,ncycles=ncycl
es,tol=tol,MCcheck=30,display=display,optmode=optmode)
par=OUT[[1]]
value=OUT[[2]]
counts=counts+OUT[[3]]
} else {
OUT=optim(par=par,fn=fn,gr=gr,method=methodv[i],lower=lower,upper=upper,control=control,hess
ian=hessian)
par=OUT$par
value=OUT$value
counts=counts+OUT$counts[1]
}
}
if (s*value<s*valueopt){
valueopt=value
paropt=par
}
}
optime=Sys.time()-t0
return(list(par=paropt,value=valueopt,counts=counts,time=optime))
}
Ackley Function
√ ( ) ( )
( )
ackley<-function(x){
f=-20*exp(-0.2*sqrt((x[1]^2+x[2]^2)/2))-exp(0.5*(cos(2*pi*x[1])+cos(2*pi*x[2])))+exp(1)+20
return(f)
}
Beale Function
( ) ( ) ( ) ( )
beale<-function(x){
f=(1.5-x[1]+x[1]*x[2])^2+(2.25-x[1]+x[1]*x[2]*x[2])^2+(2.625-x[1]+x[1]*(x[2]^3))^2
return(f)
}
Booth Function
( ) ( ) ( )
booth<-function(x){
f=(x[1]+2*x[2]-7)^2+(2*x[1]+x[2]-5)^2
return(f)
}
Bukin Function # 6
| |
( ) √| |
bukin6<-function(x){
f=100*sqrt(abs(x[2]-0.01*x[1]^2))+0.01*abs(x[1]+10)
return(f)
}
Camel Function
( )
camel<-function(x){
f=2*x[1]^2-1.05*x[1]^4+(x[1]^6)/6+x[1]*x[2]+x[2]^2
return(f)
}
Cross-in-Tray Function
√
| |
| |
| |
( ) ( )
crossintray<-function(x){
f=-0.0001*(1+abs(sin(x[1])*sin(x[2])*exp(abs(100-sqrt(x[1]^2+x[2]^2)/pi))))^0.1
return(f)
}
Easom Function
( ) (( ) ( ) )
easom<-function(x){
f=-cos(x[1])*cos(x[2])*exp(-((x[1]-pi)^2+(x[2]-pi)^2))
return(f)
}
Eggholder Function
( ) ( ) √| | √| |
eggholder<-function(x){
f=-(x[2]+47)*sin(sqrt(abs(0.5*x[1]+x[2]+47)))-x[1]*sin(sqrt(abs(x[1]-x[2]-47)))
return(f)
}
Goldstein-Price Function
( ) ( ( ) ( ))
( ( ) ( ))
goldsteinprice<-function(x){
f=(1+(x[1]+x[2]+1)^2*(19-14*x[1]+3*x[1]^2-14*x[2]+6*x[1]*x[2]+3*x[2]^2))*(30+(2*x[1]-
3*x[2])^2*(18-32*x[1]+12*x[1]^2+48*x[2]-36*x[1]*x[2]+27*x[2]^2))
return(f)
}
( ) ( )
gomezlevyC<-function(x){
f=4*x[1]^2-2.1*x[1]^4+(x[1]^6)/3+x[1]*x[2]-4*x[2]^2+4*x[2]^4
if (2*(sin(2*pi*x[2]))^2-sin(4*pi*x[1])>1.5) f=Inf
return(f)
}
Himmelblau Function
( ) ( ) ( )
himmelblau<-function(x){
f=(x[1]^2+x[2]-11)^2+(x[1]+x[2]^2-7)^2
return(f)
}
hoeldertable<-function(x){
f=-abs(sin(x[1])*cos(x[2])*exp(abs(1-sqrt(x[1]^2+x[2]^2)/pi)))
return(f)
}
Lévi Function # 13
( ) ( ) ( ) ( ( )) ( ) ( ( ))
levi13<-function(x){
f=(sin(3*pi*x[1]))^2+(x[1]-1)^2*(1+(sin(3*pi*x[2]))^2)+(x[2]-1)^2*(1+(sin(2*pi*x[2]))^2)
return(f)
}
Matyas Function
( ) ( )
matyas<-function(x){
f=0.26*(x[1]^2+x[2]^2)-0.48*x[1]*x[2]
return(f)
}
McCormick Function
( ) ( ) ( )
mccormick<-function(x){
f=sin(x[1]+x[2])+(x[1]-x[2])^2-1.5*x[1]+2.5*x[2]+1
return(f)
}
( ) ( )
mishraC<-function(x){
f=sin(x[2])*exp((1-cos(x[1]))^2)+cos(x[1])*exp((1-sin(x[2]))^2)+(x[1]-x[2])^2
if ((x[1]+5)^2+(x[2]+5)^2>=25) f=Inf
return(f)
}
Rastrigin Function
( ) ( ( )) ( ( ))
rastrigin<-function(x){
nd=length(x)
f=10*nd
for (i in 1:nd){
f=f+x[i]^2-10*cos(2*pi*x[i])
}
return(f)
}
Rosenbrock Function
( ) ( ) ( )
rosenbrock<-function(x){
nd=length(x)
f=0
for (i in 1:(nd-1)){
f=f+100*((x[i+1]-x[i]^2)^2)+(1-x[i])^2
}
return(f)
}
( )
rosenbrockC<-function(x){
f=100*((x[2]-x[1]^2)^2)+(1-x[1])^2
if ((x[1]-1)^3-x[2]+1>0 | x[1]+x[2]-2>0) f=Inf
return(f)
}
Schaffer Function #2
( )
( )
( )
schaffer2<-function(x){
f=0.5+((sin(x[1]^2-x[2]^2))^2-0.5)/(1+0.001*(x[1]^2+x[2]^2))^2
return(f)
}
Schaffer Function #4
( | |)
( )
( )
schaffer4<-function(x){
f=0.5+((cos(sin(abs(x[1]^2-x[2]^2))))^2-0.5)/(1+0.001*(x[1]^2+x[2]^2))^2
return(f)
}
Sphere Function
( )
sphere<-function(x){
nd=length(x)
f=0
for (i in 1:nd){
f=f+x[i]^2
}
return(f)
}
( ( ))
( )
simionescuC<-function(x){
f=0.1*x[1]*x[2]
c=((1+0.2*cos(8*atan(x[1]/x[2])))^2)
if (is.nan(c)==TRUE) c=Inf
if (x[1]^2+x[2]^2>c) f=Inf
return(f)
}
Styblinski-Tang Function
( )
styblinskitang<-function(x){
nd=length(x)
f=0
for (i in 1:nd){
f=f+0.5*(x[i]^4-16*x[i]^2+5*x[i])
}
return(f)
}
( ) ( (( ) )) ( )
( ) ( ) ( )
townsendC<-function(x){
f=-(cos(x[2]*(x[1]-0.1)))^2-x[1]*sin(3*x[1]+x[2])
t=atan2(x[1],x[2])
if (x[1]^2+x[2]^2>=(2*cos(t)-0.5*cos(2*t)-0.25*cos(3*t)-0.125*cos(4*t))^2+(2*sin(t))^2)
f=Inf
return(f)
}