An Adaptive Simulated Annealing Algorithm PDF
An Adaptive Simulated Annealing Algorithm PDF
www.elsevier.com/locate/spa
Received 23 November 1999; received in revised form 3 January 2001; accepted 3 January 2001
Abstract
In this paper, inspired by the idea of Metropolis algorithm, a new sample adaptive simulated
annealing algorithm is constructed on 0nite state space. This new algorithm can be considered as
a substitute of the annealing of iterative stochastic schemes. The convergence of the algorithm
is shown. c 2001 Elsevier Science B.V. All rights reserved.
1. Introduction
∗ Corresponding author. Institute of Applied Mathematics, Chinese Academy of Science, Beijing, 100080,
People’s Republic of China.
E-mail addresses: [email protected] (G. Gong), [email protected], liuyong71@
yahoo.com (Y. Liu), [email protected] (M. Qian).
1 Supported by NSFC of China 79970120.
2 Supported by NSFC of China 19971005, the Doctoral program Foundation of Institution of Higher
0304-4149/01/$ - see front matter c 2001 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 4 1 4 9 ( 0 1 ) 0 0 0 8 2 - 5
96 G. Gong et al. / Stochastic Processes and their Applications 94 (2001) 95–103
where rn is a sequence of small gains and n is the input of the system at time n, either
deterministic or stochastic. This model can be illustrated by practical setups. Let us
take the recognition of oI-line handwriting Chinese characters as an example. In this
case, samples of handwriting Chinese characters are read in one by one, denoted by
1 ; : : : ; n ; : : : , submitted the population of Chinese handwriting characters. Assume that
this population is described by an random vector , i.e. 1 ; : : : ; n ; : : : are i.i.d. copies
of . The bias of a candidate point x from the samples is measured by the following
objective function:
U (x) = E min x − ; x = (x(1) ; : : : ; x(m) ):
(i) 2
i6m
a potential function U (x) and En = 0, then under some restrictions on the behavior
P
of b(x) and b(x; n ) at in0nity, for {rn } in a wide class, one can always choose {hn }
such that for any ¿ 0,
P0; y U (Xn ) ¡ min U (z) + → 1;
z ∈ Rd
where K(n) is a function from N to N (the set of natural numbers) satisfying K(n) ¿
K(n − 1) and limn → ∞ K(n) = ∞. Denote B ≡ {( : X → R}. For any ( ∈ B, let
!%n ; ! (() ≡ x ∈ X !%n ; ! (x)((x) and (∞ = supx ∈ X |((x)|.
P = !P n ; !P ∈ X∞ on the
We consider the coordinate process {Yn ; n = 1; 2 : : :}: Yn (!)
coordinate space X , and a random probability measure Q! on X∞ such that Yn is a
∞
mn; ! ≡ Hn; ! (x; y) − Sn; ! (x) − Sn; ! (y) + min Sn; ! (z); x; y ∈ X ;
z∈X
m ≡ H (x; y) − U (x) − U (y) + min U (z); x; y ∈ X :
z∈X
Obviously, if Sn; ! − U ∞ ¡ #; then |mn; ! − m|¡4#. Due to the results of Holley and
Stroock (1988), LUowe (1995) and Diaconis and Stroock (1991), we have the following
proposition
Proposition 1.1. There exists a constant C¿0 independent of n and !, for any %n ¿
0 such that
We de0ne = minx ∈ X U (x). Let x satisfy U (x ) = . Then for any ¿ 0, we have
And then the lemma follows from the law of large numbers.
Remark 1.1. It is necessary in practical computation that c; 6; C1 ; C2; 5 , and %n are in-
dependent of !.
Remark 1.4. Actually, we can choose K(n) = n2 , then K(n) satis0es the conditions of
the theorem.
Our algorithm is designed in view of discrete time and 0nite state space. It may be
considered as a substitute of the annealing of iterative stochastic schemes (1.2) in case
of 0nite state space and can be hopefully extended to the denumerable state case with
some necessary modi0cations.
Comparing our SA algorithm with (1.2), the function of probability transition matrix
q0 (x; y) is analogous to the distribution of the arti0cial noise n in (1.2), and that
exp[ − %n (SK(n); ! (y) − SK(n); ! (x))+ ] is analogous to hn in (1.2).
In order to bring the information into full play, we use the mean of observation values
Sn . Because our algorithm converges 0nally to some set in X while mean values even
do not belong to the sample space or the state space X, that is diIerent from some
adaptive learning algorithms (see Kohonen, 1984), in which the mean values are often
considered as the cluster points.
Just because we borrow the ideas of proof of Holley and Stroock (1988), GUotze
(1992) and especially quote a general framework of Frigerio and Grillo (1993), we
only give a short sketch of our proof.
K(n)
Lemma 2.1. Let An ≡ {!; 1=K(n) j = K(n−1)+1 Uj ¡ 1=n3 }, then limk → ∞ P{ n¿k
An (x)} = 1.
K(n)
n23
¿ 1 − 2 lim EUn2 (x)
k →∞ K(n)2
n¿k j = K(n−1)+1
n3
¿ 1 − 2MEU12 (x) lim = 1:
k →∞ K(n)
n¿k
For # and for any 5 ¿ 0, by Hajek–Renyi inequality, we can take K#; 5 ¿ 0 such that
∞
for any k ¿ K#; 5 ; x ∈ X; P{!; supj¿k |Sj (x) − U (x)| 6 #} 6 #12 (D=k + D j = k+1 1=j 2 )
6 5=4N .
5
Now, we take K#; 5 ¿ K#; 5 such that for any K(k) ¿ K#; 5 ; P{ n¿k An (x)} ¿ 1 − 4N .
G. Gong et al. / Stochastic Processes and their Applications 94 (2001) 95–103 101
Lemma 2.2. If ! ∈ x ∈ X {( K(n)¿K#; 5 An (x)) ∩ {!; supK(n)¿K#; 5 |SK(n); ! (x) − U (x)|
¡ #}}, then there exists a constant M1; 5 ¿ 0 such that for any K(n) ¿ K#; 5
M1; 5
SK(n); ! ∞ 6 M1; 5 and SK(n); ! − SK(n−1); ! ∞ ¡ :
n3
M M 1 M1; 5
6 + 3 6 3 :
n3 n n
1
K(n)
If ! ∈ x ∈ X {maxK(n) ¡ K#; 5 K(n) | j = 1 (Uj (x) − U (x))| 6 J5 }, then there exists a
constant M2; 5 ¿ 0 such that SK(n); ! (x)∞ ¡ M2; 5 , for K(n) 6 K#; 5 . Moreover, it is
easy to show that there exist the constants C; P M3; 5 satisfying 0 ¡ CP 6 C and M3; 5 ¿ 0
such that
Lemma 2.3. If ! ∈ B(K#; 5 ), then there exists a constant M4; 5 (independent of !) such
that
(1) SK(n); ! ∞ ¡ M4; 5 ; n = 1; 2 : : : ;
P −%n (m+4#) ; n = 1; 2 : : : ;
(2) -%n ; ! ¿ Ce
(3) SK(n); ! − SK(n−1); ! ¡ M4; 5 =n3 ; n = 1; 2 : : :.
102 G. Gong et al. / Stochastic Processes and their Applications 94 (2001) 95–103
For ( ∈ B, we denote
Qn; ! (y) = q%n ; ! (x; y)Qn−1; ! (x); Qn; ! (() = Qn; ! (x)((x); n = 1; 2 : : : ;
x∈X x∈X
Proof of the Main Theorem. Let A ≡ {x: U (x) ¿ +}. Due to Lemma 2.2, it implies
for any 5 ¿ 0, there exists n0 ¿ K#; 5 , such that for any n ¿ n0
5
P{!%n ; ! (A ) ¡ C1 n−=4c } ¿ 1 − :
4
Let B̂n (K#; 5 ) ≡ {!; !%n ; ! (A ) ¡ kn−=4c } ∩ B(K#; 5 ). If ! ∈ B̂n (K#; 5 ), then
Qn; ! (A ) 6 !%n ; ! (A ) + |Qn; ! (A ) − !%n ; ! (A )| ¡ C1 n−=4c + C2; 5 n6 :
Hence we have
−=4c 6
P Q! U (Yn ) ¿ + min U (x) ¡ C1 n + C2; 5 n
x∈X
¿ P{B̂n (K#; 5 )} ¿ 1 − 5:
3. Unsolved problems
According to the referee’s suggestions, we use SK(n) at the nth iteration of the SA
algorithm rather than Sn in our original manuscript. This idea leads to the expected
condition c ¿ m on the speed of decrease of the temperature in the main theorem,
which become more delicate. As the referee pointed out, the question of the optimal
choice of K(n) should be observed. How can we choose K(n) such that the speed of
convergence is as fast as possible for given 5; . Another question is how to judge
whether this SA algorithm with the random energy has suXciently closed to a required
set in a limited time and determine when it should be stopped.
G. Gong et al. / Stochastic Processes and their Applications 94 (2001) 95–103 103
Acknowledgements
We would like to thank the referee for his valuable comments, which were a great
incentive to improve our paper.
References
Benveniste, A., MMetiviet, M., Priouret, P., 1990. Adaptive Algorithms and Stochastic Approximations.
Springer, Berlin.
Diaconis, P., Stroock, D., 1991. Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1
(1), 36–61.
Frigerio, A., Grillo, G., 1993. Simulated annealing with time-dependent energy function. Math. Z. 213,
97–116.
Fang, H.T., Gong, G.L., Qian, M.P., 1997. Annealing of iterative stochastic schemes. SIAM J. Control
Optim. 35 (6), 1886–1907.
Gelfand, S.B., Mitter, S.K., 1991. Recursive stochastic algorithm for global optimization in Rd . SIAM
J. Control Optim. 29, 999–1018.
Gelfand, S.B., Mitter, S.K., 1993. Metropolis-type annealing algorithms for global optimization in Rd . SIAM
J. Control Optim. 31, 111–131.
GUotze, F., 1992. Rate of convergence of simulated annealing processes, preprint paper.
Hertz, J., Krogh, A., Palmar, G.R., 1991. Introduction to the Theory of Neural Computation. Santa Fa Inst.
Studies in the Science of Complexity. Addison-Wesley, Reading, MA.
Holley, R.A., Stroock, D.W., 1988. Simulated annealing via Sobolev inequalities. Comm. Math. Phys. 115,
553–569.
Kirpatrick, S., Gelatt, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. Science 220, 671–680.
Kohonen, T., 1984. Self-organization and Associative Memory. Springer, New York.
Kushner, H.J., 1987. Asymptotic global behavior for stochstic approximation and diIusions with slowly
decreasing noise eIects: global minimization via Monte Carlo. SIAM J. Appl. Math. 47, 169–185.
Ljung, L., PVug, G., Walk, H., 1992. Stochastic approximation and optimization of random system.
BirhUauser-verlag, Basel.
LUowe, M., 1995. Simulated annealing with time-dependent energy function via Sobolev inequalities.
Stochastic Process. Appl. 63, 221–233.
MMetivier, M., Priouret, P., 1987. ThMerMemes de convergence presque sûre pour une classe d’algorithmes
stochastiques aM pas dMecroissant. Probab. Theory Related Fields 74, 403–428.
Michalwicz, Z., 1992. Genetic Algorithms + Datastuctuers = Evolution Programs. Springer, Berlin.
Wentzell, A.D., 1990. Limit Theorems on Large Deviations for Markov Stochastic Processes. Kluwer
Academic Publishers, Dordrecht.