Numerical Methods For Stochastic Control Problems in Continuous Time (PDFDrive)
Numerical Methods For Stochastic Control Problems in Continuous Time (PDFDrive)
24
Stochastic Control
Edited by I. Karatzas
M.Yor
With 40 Figures
'Springer
Harold J. Kushner
Paul Dupuis
Division of Applied Mathematics
Brown University
Providence, RI 02912, USA
Managing Editors
I. Karatzas
Departrnents of Mathematics and Statistics
Columbia University
New York, NY 10027, USA
M. Yor
CNRS, Laboratoire de Probabilites
Universite Pierre et Marie Curie
4, Place Jussieu, Tour 56
F-75252 Paris Cedex 05, France
Ali rights reserved. This work may not be translated or copied in whole or in part without the
written perrnission of the publisher (Springer Science+Business Media, LLC ), except for brief
excerpts in connection with reviews or scholar1y analysis. Use in connection with anyforrn
of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even
if the forrner are not especially identified, is not to be taken as a sign that such names, as
understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely
by anyone.
9 8 7 6 5 4 3 2 1
SPIN 10780458
ISBN 978-1-4612-6531-3 ISBN 978-1-4613-0007-6 (eBook)
DOI 10.1007/978-1-4613-0007-6
To
and
Introduction 1
References 455
Index 467
List of Symbols 4 73
Introduction
Changes in the second edition. The second edition differs from the first
in that there is a full development of problems where the variance of the
diffusion term and the jump distribution can be controlled. Also, a great
deal of new material concerning deterministic problems has been added,
including very efficient algorithms for a class of problems of wide current
interest.
be any of the standard types: Discounted, stopped on first exit from a set,
finite time, optimal stopping, average cost per unit time over the infinite
time interval, and so forth. There might be separate costs when the process
is on the boundary and when it is in the interior of the set of interest. In
fact all of the standard cost functionals can be dealt with by the meth-
ods to be presented. There is a close connection between approximation
methods for stochastic control and those for optimal nonlinear filtering,
and approximation methods for the latter problem are also discussed.
The class of methods to be dealt with is referred to generically as the
Markov chain approximation method. It is a powerful and widely usable
set of ideas for numerical and other approximation problems for either
controlled or uncontrolled stochastic processes, and it will be shown that
it has important applications to deterministic problems as well. The ini-
tial development of the approximation method and the convergence proofs
appeared in the first author's 1977 book. Since that time new classes of
problems have arisen to which the original proofs could not be applied di-
rectly, the techniques of approximation and mathematical proof have been
.considerably streamlined, and also extended to cover a large part of the
new problems of interest in continuous time stochastic control. In addition,
many new techniques for actually doing the computations have been de-
veloped. The present book is a revision and updating of the 1992 edition.
There is much new material on the problems of jump and variance control
and on deterministic problems as well, with possible discontinuities in the
data.
The basic idea of the Markov chain approximation method is to approx-
imate the original controlled process by an appropriate controlled Markov
chain on a finite state space. One also needs to approximate the original cost
function by one which is appropriate for the approximating chain. These
approximations should be chosen such that a good numerical approxima-
tion to the associated control or optimal control problem can be obtained
with a reasonable amount of computation. The criterion which must be
satisfied by the process approximation is quite mild. It is essentially what
we will call "local consistency." Loosely speaking, this means that from a
local point of view, the conditional mean and covariance of the changes in
state of the chain are proportional to the local mean drift and covariance
for the original process. Such approximations are readily obtained by a va-
riety of methods and are discussed extensively in Chapters 4 and 5. The
numerical problem is then to solve the problem for the approximating con-
trolled chain. Methods for doing this are covered in detail in Chapters 6 to
8, with the basic concepts being in Chapter 6. One needs to prove that the
solutions to the problems with the approximating chain actually converge
to the correct value as some approximation parameter goes to zero. One
of the great advantages of the approach is that this can often be done by
probabilistic methods which do not require the use of any of the analytical
properties of the actual solution. This is particularly important since for
Introduction 3
stochastic control is in its infancy and much more effort is needed on all
phases of the area.
It is a pleasure to acknowledge the considerable help of Luiz Felipe Mar-
tins, John Oliensis, Felisa Vasquez-Abad and Jichuan Yang in the prepa-
ration of the first edition. We also thank John Oliensis for the figures in
Chapter 15. This work has been supported for many years by the National
Science Foundation and the Army Research Office.
1
Review of Continuous Time Models
In this book we will consider methods for numerically computing the value
function for certain classes of controlled continuous time stochastic and
deterministic processes. The purpose of the present chapter is to provide
an introduction to and some of the background material for controlled
diffusions and controlled jump diffusions. These types of processes include
many of the models that are commonly used. This section is only intended
to serve as a review of the main ideas and for purposes of reference. Other
models (e.g., singularly controlled diffusions) that are also of interest will
be introduced and elaborated on in the appropriate later sections of the
book.
Our main interest in the present chapter is in constructing and estab-
lishing certain properties of the processes. Chapter 9 will also deal with
important background material, such as alternative characterizations of
the processes and the theory of weak convergence. Section 1.1 presents the
definitions and fundamental inequalities of martingales. In Section 1.2, we
review integration with respect to the Wiener process and state the associ-
ated chain rule (Ito's formula). With the definition of the stochastic integral
and the appropriate martingale estimates in hand, in Sections 1.3 and 1.4
we define what is meant by a solution of a stochastic differential equation
and outline the proof of existence of solutions. The processes defined by the
solutions of these stochastic differential equations will serve as our models
of controlled continuous time processes with continuous sample paths. We
also discuss the notion of uniqueness of solutions that will be suitable for
our later work. We will first consider processes without control, and then
indicate the extension to the controlled case. For purposes of numerical
8 1. Review of Continuous Time Models
See [83].
A random variable T : n -+ [0, oo] is called an .1"t-stopping time if
{r ~ t} E .1"t for all t E [O,oo). If x(·) is an .1"t-martingale and Tis a
uniformly bounded .1"t-stopping time, then the stopped process x(t 1\ r) is
also an .1"t-martingale. Thus, (1.2) and (1.3) also hold if we replace T by
T 1\ T, where T is any .1"t -stopping time.
If there exists a nondecreasing sequence {Tn, n = 1, 2, ... } of .1"t-stopping
times such that Tn-+ oo w.p.l and such that for each n the stopped process
x(t 1\ Tn) is a martingale, then x(·) is called an .1"t-local martingale.
J:
simple model for the driving noise. The only quantity needing explanation
in the expression for x(·) is the term a(x(s))dw(s), to which we now
turn.
We will consider stochastic integrals with respect to two basic processes.
The first process is the Wiener process. As is well known, the resulting
stochastic integral and related theory of stochastic differential equations
(due to K. Ito) provide a very convenient family of models that are Marko-
vian and possess continuous sample paths. In the beginning of the section,
we define the Wiener process and recall some basic properties. We then re-
view Ito's definition of integration with respect to the Wiener process and
state the chain rule. In order to model processes involving jumps, we will
make use of Poisson random measures as a driving term. The associated
stochastic integral is, in a certain sense, easier to define than for the case of
the Wiener process. This integral will be defined and the combined chain
rule for both types of driving noise will be given in Section 1.5. If A is a col-
lection of random variables defined on a probability space (0, :F, P), then
we use :F(A) to denote the a-algebra generated by A. If Sis a topological
space, then B(S) is used to denote the a-algebra of Borel subsets of S.
Wiener Process. Let (0, :F, P) be a probability space and let {:Ft, t ~ 0}
be a filtration defined on it. A process {w(t), t ~ 0} is called an :Ft- Wiener
process if it satisfies the following conditions.
1. w(O) = 0 w.p.l.
sample path process that is both Markovian and a martingale. The fact that
it has continuous sample paths and is also a martingale imply the sample
paths of w( ·) are of unbounded variation over any nontrivial time interval
(w.p.l). This excludes defining Ja(t)dw(t) by any pathwise construction if
we wish to allow a large class of integrands. Nonetheless, a useful integral
may be defined in a straightforward manner if we properly restrict the class
of allowed integrands. We will impose conditions on the integrand which
will imply that the resulting integral is an Ft-martingale when considered
as a function of the upper limit of integration. Thus, we will be able to
use the martingale estimates in the construction and applications of the
integral.
Remark. In this book we will always assume the coefficients in the equa-
tions are bounded. This is not much of a restriction for our purposes because
the state spaces of the processes will be bounded for numerical purposes.
Because of this boundedness our definitions are somewhat simpler than is
typical in thorough treatments of the theory of SDE.
We are now in the position to define the integral with respect to a Wiener
process. For full details concerning the arguments used below, the reader
may consult the book of Karatzas and Shreve [83]. The integral is defined
for an arbitrary integrand in Eb via an approximation argument. In general,
the stochastic integral defined below will be unique only in the sense that
any two versions will have sample paths that agree with probability one. We
will follow the usual convention of identifying any process with the class
of processes whose sample paths are identical with probability ~me and,
therefore, omit the corresponding qualification in the arguments below.
J:
Definition and Elementary Properties of udw when u E :Eb. Let
w(·) be a standard Ft-Wiener process and let a E Eb be given. Because
12 1. Review of Continuous Time Models
1O
t
u(u)dw(u) =
n-l
L u(ti) [w(ti+l)- w(ti)] + u(tn) [w(t)- w(tn)]
i=O
for t E [tn, tn+l)· It can be shown [83, Proposition 3.2.6] that for each
u E Eb there exist {an, n E IN} C Eb" such that for each T E [0, oo),
(2.1)
E;:8 1t u(u)dw(u) = 1 8
u(u)dw(u), (2.2)
These properties follow easily from the definition of the integral for u E Eb"
and are extended to integrands in Eb by approximation.
Given an open subset U of some Euclidean space, we let Ck(U) denote
the set of all real valued functions on U that have continuous derivatives
up to and including order k.
Ito's Formula. Let f E C 1 (JR) and let x(t) = x(O)+ I~ b(s)ds. The change
of variable formula for the composed function f(x(t)) is of course
that the change of variable formula takes a slightly more cumbersome form
and is valid under more restrictive conditions than in the classical case.
Nonetheless, it is still an extraordinarily powerful tool. Consider the more
general form
to express the relationship (2.5). We will not state Ito's formula under very
general conditions, but only in the form needed later in the book. Thus,
we assume b(·) and a(·) are in Eb. Then Ito's formula states that for any
f E C 2 (./R),
t
f(x(t))- f(x(O)) = Jo fx(x(s))dx(s)
1
+ 2 Jo
[t
fxx(x(s))a 2 (s)ds,
where
hi(t) = t 1t
j=l 0
aii(s)dwi(s)
for i E {1, ... , k }. Suppose that b( ·) = (b1 (·), ... , bk( ·))' and that bi( ·) E Eb
for each i E {1, ... , k }. Define
in lieu of (2.6). Then the vector version of Ito's formula is as follows. For
any f E C 2(JRk), let fx(·) and fxxO denote the gradient and Hessian
matrix of J, respectively. Define a(s) = a(s)a'(s). Then
where
Let w(·) be a vector valued Ft-Wiener process and let x(O) be a given
F 0 -measurable random vector. By a solution to the SDE described by
1.3 Stochastic Differential Equations: Diffusions 15
Weak Existence. We say that weak existence holds if given any probabil-
ity measure J.t on JRk there exists a probability space (0., F, P), a filtration
Ft, an Ft-Wiener process w(·), and an Ft-adapted process x(·) satisfying
(3.1) for all t ~ 0 as well asP {x(O) E r} = J.t(r).
Ito's Formula and the Differential Operator. Let x(·) be any solution
to (3.1), and suppose that the coefficients b(x(·)) and a(x(·)) satisfy the
conditions assumed of b(·) and a(·) in the last section. The link between
the process x( ·) and certain second order partial differential equations is
provided by Ito's formula and the differential operator that appears therein.
Let a(x) = a(x)a'(x), and for any f E C 2(Jflk) let
1
(£f)(x) = f~(x)b(x) + 2tr [fxx(x)a(x)]. (3.5)
Theorem 3.1. Assume (A3.1). Then for every deterministic initial condi-
tion x(O), the SDE (3.1) has a strong solution that is unique in the strong
(and therefore also in the weak) sense.
The proof turns on the following estimate: let Zi, i = 1, 2 be continuous
Ft-adapted processes and define
6.y(t) = Yl(t)- y2(t), and 6..z(t) = z1(t)- z2(t). Then for each T E (0, oo)
there exists L E (0, oo) such that for all 0 ~ t ~ T,
This follows directly from the Lipschitz continuity properties of b and a, the
martingale property of the stochastic integral, and the estimate (1.3). For
more details, the reader may consult [83, Section 5.2]. From this estimate,
one may readily prove strong existence and strong uniqueness, as we now
show.
To prove uniqueness, let x1(·) and x2(·) both be solutions, and let f(t) =
Esupo<s<t lx1(s)- x2(s)i2. By taking xi(·)= Yi(·) = zi(·),i = 1,2, and
applying 13.7), we obtain
f(t) ~L 1t f(s)ds.
Then Gronwall's inequality [56] implies f(t) = 0 for all t E [0, T]. Because
T is arbitrary, this proves uniqueness in Ck [0, oo).
A solution to (3.1) can be constructed by a variation on the classical
technique of Picard iteration. A sequence of processes {Xn (·)} is defined
recursively by x 0 (t) = x(O), t 2: 0, and
By the way in which the processes were defined, the elements of this se-
quence are Ft-adapted processes with continuous sample paths. Apply-
ing (3.7) for n 2: 1 with Yl(·) = Xn+l(·),y2(·) = z1(·) = Xn(·), and
z2(·) = Xn-1(·), we obtain
occurs infinitely with probability zero. Therefore, off a set N of zero prob-
ability, the sample paths of xn(·) are a Cauchy sequence in Ck [0, T]. Let
x(·,w) denote the limit of Xn(·,w) for w ¢ N. Because Tis arbitrary, we
can assume that the convergence, in fact, takes place in Ck [O,oo). Clearly,
x(·) is Ft-adapted. Since the assumed continuity properties of b(·) and
u(·}, (1.3}, and (2.3} imply
By Ito's formula
R(t) =1 +lot R(s)z'(s)dw(s).
Because z(·) is bounded, EIR(t)i < oo. Thus, the process R(·) is a martin-
gale and therefore E R( t) = 1 for all t E [0, oo).
Now fix T E (O,oo), and define a probability measure PT on (O,FT) by
(3.9)
valued with the property that u11 ( x) is uniformly bounded for x E JRk.
Consider the stochastic differential equation
u1(x)dw1
{3.11)
b2(x)dt + u2(x)dw2,
where the dimensions of the xi(·),i = 1,2, b2{·), and 0'2(·) are all com-
patible. Assume that the drift vector and diffusion matrix of this equation
satisfy the continuity and boundedness conditions assumed for the Picard
iteration method, or any other set of conditions which guarantees the ex-
istence of a weak sense solution. Let b1 ( ·) be a bounded Borel measurable
function. If we define z1 {·) = u! 1 (x(·))b 1 (x(·)), and R(·), j\., and w10 by
(3.8), (3.9), and (3.10), respectively, then under j\., x(·) solves
to (3.13}. We say that weak uniqueness holds if equality of the joint distri-
butions of (ui(·),wi(·),xi(O)) under Pi,i = 1,2, implies the equality of the
distributions of (xi(·),ui(·),wi(·),xi(O)) under Pi,i = 1,2.
Strong solutions that are unique in the strong sense and weak solutions
that are unique in the weak sense may be constructed in exactly the same
way as for the case of uncontrolled diffusions. For example, assume the
following analogue of (A3.1).
1.4 Reflected Diffusions 21
for all x E JRk, y E JRk, and u E U. Furthermore, the function b : JRk xU --+
JRk is measumble, and supuEU lb(O,u)l < oo.
Under this assumption the Picard iteration and Girsanov transformation
methods may be used to get existence and uniqueness results analogous to
those without control. The details are the same as those in the case with
no control as long as the control process is Ft -adapted and measurable,
with the only changes in the proofs being notational.
For fixed a E U, we define the differential operator .co. by
1
(.Co. f)(x) = f~(x)b(x, a)+ 2tr [fxx(x)a(x)].
where the prime denotes transpose, cov stands for covariance, and a is a
k x k matrix. Let [a) denote the integer part of a. It is well known that the
process
[t/n]
xn(t) = n- 1/ 2 L ~i + x
i=l
tends weakly (see Section 9.1 for the definition) to the solution of the SDE
Because the process yn(-) does not involve the constraint, its behavior is
easy to determine. As remarked previously, yn(-) will converge weakly to
the solution to dy(t) = adw(t), y(O) = x. The process zn(-) plays the role
of a bookkeeping device, by recording the effects of the projections in such
a way that xn(-) can be recovered from yn(-) by adding zn(-). The zn(-)
has three important properties. First, zn (·) can change only at times t such
that xn(t) E 8G. Thus, the evolution of xn(-) is the same as that of yn(-)
when xn(-) is away from 8G. Second, the direction of change of zn(-) is
determined by the constraint mechanism, and the last property is that the
change in zno at any given time is the minimal amount needed to keep
xn(-) in G.
1.4 Reflected Diffusions 23
part 5 implies that 'f/ is only allowed push in the directions consistent with
the current position of ¢ when it is on the boundary.
A comparison of these properties with the properties that one would ex-
pect of the sample paths of reflecting diffusions indicates a close connection.
Our definition of a stochastic differential equation with reflection (SDER)
is as follows. (The definition will be given without a control. Extending the
definition to include a control is straightforward.) Consider a probability
space (n, F, P) on which is defined a filtration {Ft. t ;::: 0}. Let b( ·) and u( ·)
be functions of the appropriate dimensions and suppose that {w(t), t;::: 0}
is an r-dimensional standard Fr Wiener process.
where
lzl(t) =lot I{:z:(s)EBG}dlzl(s) < oo,
and where there exists measurable 'Y(s) E r(x(s)) (J.tz a.e.) such that
(w.p.1).
In other words, (x(·),z(·)) should solve (on a pathwise and w.p.1 basis)
the SP for 1/J(·) = x(O) +I~ b(x(s))ds +I~ u(x(s))dw(s).
As in the case of diffusions without boundaries, there are two senses in
which solutions can be said to exist and also two senses in which they
can be said to be unique. These definitions are simply the exact analogues
of those for the case of no boundary. In general, the existence of strong or
weak solutions to the SDER and the relevant uniqueness properties depend
on regularity properties of G and r( ·). Those approaches that are based on
the Skorokhod Problem use these regularity properties to derive estimates
for the mapping 1/J ~ (¢,'fl) defined by solving the Skorokhod Problem.
We next present a few very simple examples for which this mapping is Lip-
schitz continuous. This allows an elementary derivation of (3.7), after which
the analysis proceeds just as in the case of unconstrained diffusions. The
basic differences between these examples and the more general situations
considered in [41, 115, 134, 147] are the more involved calculations required
to get (3.7).
26 1. Review of Continuous Time Models
Example 4.3. (Anderson and Orey) [5] Let n be a given unit vector,
and let G be the interior of {x: (x, n) :::; 0}. Thus, n is the outward normal
at all points of 8G. Let r be a unit vector that satisfies (r, n) < 0. We will
take r(x) = {r} for all x E 8G. Suppose that¢(·) is given with ¢(0) E G.
If we define
I?JI(t) =- (o V sup ('1/J(s), n)) /(r, n),
O~s9
11(t) = l11l(t)r,
and, therefore,
and Yi(t) = r(i/i)(t) fort ~ 0 and i = 1, 2. Obviously, the fliO are measur-
able, Ft-adapted processes with continuous sample paths. Equations (4.1)
and (4.2) imply uniqueness of the pair (¢, 1J) given '¢. If x E G and '¢( t) = x
for all t ~ 0, then (¢(t),1J(t)) = (x,O) is the solution to the SP. This fact
together with (4.2) imply the continuity of ¢ whenever '1/J is continuous
and¢= r('¢). Thus, the mapping r is a Lipschitz continuous mapping of
Ck [0, oo) into itself. Furthermore, uniqueness of the solution to the SP and
our explicit representation of the solution imply that ¢(t) is a measurable
function of {'1/J(s),s E [O,t]}. Therefore, the processes Yi(·),i = 1,2, are
measurable, Ft-adapted processes with continuous sample paths. Define
~fj(t) = fj1(t)- fj2(t) and also ~y(·) and ~z(-) in an analogous fashion.
Then, under (A3.1), equation (3.7) gives
E [sup
O~s~t
l~fj(s)1 2 ] :S L t
Jo
E [ sup
O~u~s
l~z(u)1 2 ] ds
1.4 Reflected Diffusions 27
Let n(-) denote the set of outward normals and suppose r(·) = -n(·). Thus,
the constraining action is applied along the direction of the inward normal.
Note that r(·) is multi-valued at any of the "corner" points of 8G. It is
proved in [42] that the solution mapping to the Skorokhod Problem is again
Lipschitz continuous in the sense of equations (4.1} and (4.2}, although the
coefficients in this case are k 112 and k112 +1, respectively. Therefore, SDER
may be solved and uniqueness proved just as in Example 4.3. •
where b(·) and u(·) are as in the preceding sections and the J(t) term
produces the jumps. For the jump term we would like to specify (at least
approximately) the probability that a jump occurs in any small time in-
terval together with the distribution of any resulting jumps as functions of
the past history of the process. Between jumps, the term J(·) is constant.
In order to preserve the Markov property, the "jump intensity" at time t
and distribution of any jumps at time t should depend only on limstt x( s).
Let A(·) be a function mapping JRk into [O,oo), and let fi(x,dy) be a
probability transition kernel on JRk. For now assume that A(·), b(·) and
a(·) are all continuous and that fi{x, ·)is continuous in x in the topology of
weak convergence. Let ~t > 0 be small and fix t. A rough description of the
term J (·) that is consistent with the properties described above is as follows.
Let x(t-) = limstt x(s). With probability equal to A(x(t- ))~t+o(~t), J(·)
will jump once at some time in the interval [t, t + ~t]. The probability of
two or more jumps is o(~t). Thus, A(·) gives the overall jump rate. Given
that a jump has occurred, its distribution will be given approximately by
fi(x(t- ), ·).Between jumps, the process x(·) behaves like a diffusion process
with no jumps and with the local properties described by b(·) and a(·).
The general theory and treatments of various approaches to such pro-
cesses can be found in [75, 78, 79]. For the purposes of this book, the pro-
cesses may be constructed and the needed properties proved using rather
simple arguments.
random variables are independent of w(·). Let llo = 0 and lln+l =lin+ Tn·
The lin will be the jump times of the process. Let q : JRk x JRn --+ JRk be
a bounded measurable function. Starting with a given initial condition x,
we construct a solution x 1 (·) to
Then define
and so on. The process thus constructed will be defined for all t ~ 0 since
lin--+ oo as n--+ oo (w.p.1). The mutual independence of the components
used to construct the process implies the Markov property.
The process we have constructed satisfies the description given at the
beginning of the section, with
- A)
II(x, = -
A(x)
A 1 {p:q(x,p)EA,q(x,p)ofO}
II(dp). (5.2)
A5.1. The function A(x) is continuous and uniformly bounded, and the
support of fi(x, ·) is contained in some compact set that is independent
of x. Furthermore, fi(x, ·) is continuous as a mapping from the x E JRk
into the space of probability measures endowed with the topology of weak
convergence.
It can be shown [79] that if A(·) and IT(·,·) satisfy (A5.1), then A< oo,
a probability measure II(·), and a bounded measurable function q(·, ·) can
be found so that (5.1) and (5.2) hold. For convenience, we will consider A,
II(·), and q(·, ·) as characterizing the jump part of a jump diffusion.
30 1. Review of Continuous Time Models
together with the .1"0 -measurable initial condition x(O), what is meant is
an Ft -adapted process x( ·) with paths in Dk [0, oo) which satisfies the
integrated form
x(t) = t
Jo
b(x(s))ds + t
Jo
a(x(s))dw(s) + f
J[o,t]xJRn
q(x(s-),p)N(dsdp).
(5.4)
In complete analogy with case of diffusions with no jumps, we have def-
initions of weak and strong existence, and weak and strong uniqueness.
Because we will only use the weak sense properties in both cases, state-
ments are only given for these cases.
Weak Existence. We say that weak existence holds if given any proba-
bility measure JL on fflk, there exists a probability space (0, F, P), a filtra-
tion Ft, an Ft-Wiener process w(·), an Ft-Poisson random measure N(-)
and an Ft-adapted process x(·) satisfying (5.4) for all t 2:: 0, as well as
P{x(O) E A}= JL(A).
Ito's Formula. The following formula follows directly from the definition
of a solution to (5.4) and the corresponding formula for the case of diffusions
with no jumps.
c(x+q(x,p)) =x+q(x,p).
Thus, without loss, we can assume that the following convention is always
in effect. Whenever we are dealing with a reflected jump diffusion, we will
1.5 Processes with Jumps 33
where, as usual, we interpret this equation via its integrated form. We have
the following definitions.
to (5.6). We say that weak uniqueness holds if equality of the joint distri-
butions of (ui(·),wi(·),Ni(·),xi(O)) under Pi,i = 1,2, implies the equality
of the distributions of (xi(·), ui(·), wi(·), Ni( ·), xi(O)) under Pi, i = 1, 2.
By arguing in the same way as for the case without control, one can
show that if there is weak sense uniqueness for the case of diffusions with
arbitrary U -valued admissible control laws and initial conditions, then
there is weak sense uniqueness here as well.
34 1. Review of Continuous Time Models
deals with the optimization problem when the control stops at the first
moment of reaching a target or stopping set. The basic concept of con-
traction map is introduced and its role in the solution of the functional
equations for the costs is emphasized. Section 2.5 gives the results for the
case where the process is of interest over a finite time only. The chapter
contains only a brief outline. Further information concerning controlled or
uncontrolled Markov chain models can be found in the standard references
[11, 54, 84, 88, 126, 151, 155].
For given functions c(·) and g(·), define the total cost until stopping by
(1.2)
property of {~n• n < oo} and rewriting (1.2) as follows. For xES- aS,
= c(x) + LP(x,y)W(y).
yES
Now let n ~ oo. The first term on the right side goes to zero and the limit
of the second term is just the vector of costs with components defined by
(1.2).
For an alternative and often preferred method of writing the cost function
in vector form, one eliminates the states in the boundary set as and uses
the reduced state spaceS- as, as follows: Define r(x,y) = p(x,y) for
x,y E B-as and define the transition matrix R = {r(x,y); x,y E S-aS}.
Set W = {W(x),x E S- aS}, and define the vector of cost rates C =
38 2. Controlled Markov Chains
W=RW+C. (1.5)
L e-.Bnc(en)·
00
{1.8)
W = RW +C. {1.10)
Equation {1.10) has a unique solution whether or not ExN < oo for all
x, due to the fact that the discounting implies that nn---+ 0.
W(x) =Ex}; exp [- ~ /3{ei)] c(en) +Ex exp [-}; /3{ei)] g(eN ).
{1.11)
Also, {1.9) and {1.10) continue to hold, but with /3 replaced by /3(x) in the
line that calculates W (x).
Px{en E as+, all n < oo} = 0, and {3(x) = 0 for all x E as+. {1.12)
The above equation guarantees that the chain cannot get stuck on the
reflecting boundary. Suppose that the cost function W(x) of interest is still
{1.11) but with {3(x) > 0 for XEs- as+. Then the recursive equation for
W(x) is
Equation {1.13) has a unique solution due to the discounting and {1.12).
Let us put the cost equation {1.13) into vector form. Define the dis-
counted substochastic matrix R = {r{x, y); x, y E S- aS} by
e-f3(z)p(x y) xEs-as-as+
r(x,y)= { ' ' {1.14)
p(x,y), x E as+.
By the ergodic theorem for Markov chains [23, 84], for each x
2.1 Recursive Equations for the Cost 41
There is an auxiliary function W (·) such that the pair (W (·),-y) satisfies
the equation [11, 88, 126]
where e = (1, ... , 1) is the column vector all of whose components are
unity, P = {p(x, y); x, yES}, and C = {c(x); xES}. On the other hand,
suppose that (1.18) holds for some pair (W, 'Y)· Premultiply each side of
(1.18) by 1r and use the fact that 1r = 1rP to get (1.16). One choice for W
is
L pi(C- e-y),
00
i=O
which is well defined because pic ---t e-y at a geometric rate under our
conditions.
An alternative way of showing that the 'Yin (1.17) equals the value in
(1.16) involves iterating (1.17) n times to get
n-1
W = pnw + L pi(C- e-y).
i=O
Now divide by nand let n ---too to get (1.16) again. The function W(-) in
(1.18) is not unique. If any vector of the form ke is added toW(·), where
k is a real number, then (1.18) still holds for the new value. We will return
to a discussion of the ergodic cost problem for Markov chains in Chapter
7, where additional conditions for the existence of solutions to (1.17) are
given.
(1.19)
Using the Markov property, as done in (1.3) or (1.7), for n < M and
42 2. Controlled Markov Chains
l}
XES- aS we can write (1.19) as
= Ex,nW(~n+l,n+1)+c(x),
{1.20)
where we use ~n+l to denote the state at absolute time n + 1. Note that
we are not using the "shifting initial time" terminology which is frequently
used (e.g., as in [49]) when working with Markov chains, but rather an
absolute time scale. The boundary conditions are
where R and C are the same as above (1.5) and the terminal boundary
condition is W(M) = {g(x,M),x E S- aS}.
problems occur in the timing of the buying or selling of assets and in the
theory of reliability and maintenance, and additional examples are given in
the references on stochastic control.
Definition. Let N denote a random variable with values in the set [0, oo].
N will be the time at which the chain is stopped. If the chain is never
stopped for some sample path, then N = oo on that path. In the sequential
sampling problem above, the value N = oo would occur if sampling never
stopped. We say that N is an admissible stopping time or simply a stopping
time for the chain {en, n < oo} if it "does not depend the future." More
precisely, N is an admissible stopping time if for any n, m > n, and function
F(·), we have
Loosely speaking, for any n, the event {N = n} might depend only on the
past and current values of the state {ei,
i :::; n }, or it might depend on other
quantities as well, as long as the above "Markov property" is preserved.
for some given functions c(·) and g(·). In Subsection 2.1.2, N was the first
entrance time into the a priori selected set as. If the state dependent
discount factor {3(x) > 0 is used, then replace (2.1) by
W(x, N) = Ex}; exp [- ~ f3(ei)] c(en) +Ex exp [-};. f3(ei)] g(eN ).
(2.2)
Before writing the functional equation for the optimal cost, let us con-
sider the special case where N is a stopping time which is a priori defined to
be the first entrance time of the chain into a selected set and use the Markov
property to get the functional equation for the associated cost. In partic-
ular, suppose that there is a set So c S such that N = min{n: en E So}.
Then the functional equation (1.9) holds for the cost (2.2). In particular,
xES-So
(2.3)
x E So.
44 2. Controlled Markov Chains
where the infimum is over all the admissible stopping times for the chain.
Owing to the discounting, V(x) is bounded. We now describe the funda-
mental procedure for obtaining a functional equation for V(x). Suppose
that the current time is n, the current state is en = x, and that the process
has not yet been stopped. We need to decide whether to stop (and attain
an immediate and final cost g(x)) or to continue. If we allow the process
to continue, whether or not that decision is optimal, then there is an im-
mediately realized cost c(x) as well as costs to be realized in the future. If
we allow the process to continue, then the next state is en+l· Suppose that
we continue, but follow an optimal decision policy from the next time on.
Then, by the definition of V(x), the optimal discounted cost as seen from
the next time is V(en+l)· Its mean value, discounted to the present, is
Thus, if we continue at the current time and then act optimally from the
next time on, the total discounted cost, as seen from the present time, is
e-.8(x)ExV(et) +c(x).
We still need to choose the decision to be taken at the present time. The
optimal decision to be taken now is the decision which attains the minimum
of the costs over the two possibilities, which are: {1) stop now; {2) continue
and then act optimally in the future. Hence, the optimal cost must satisfy
the equation
If the two terms in the bracket in {2.4) are equal at some x, then at that x
it does not matter whether we stop or continue. The term in (2.4) which is
the minimum tells us what the optimal action is. The set So= {x: V(x) =
g(x)} is known as the stopping set. For x fj. S0 , {2.4) implies that we should
continue; otherwise, we should stop.
The Principle of Optimality. The method just used for the derivation
of {2.4) is known as the principle of optimality. It is the usual method
for getting the functional equations which are satisfied by optimal value
functions for control problems for Markov process models when the control
at any time can depend on the state of the process at that time. It will also
2.2 Optimal Stopping Problems 45
and the degenerate transition matrix R(u) = {r(x,ylu(x)); x,y E S}. Let
W(u) = {W(x, u}, x E S} denote the vector of total discounted costs
(2.2} under the feedback control u(·}, and let V = {V(x}, x E S} denote
the vector of least costs. Then we can write (2.4} in the following lSI-
dimensional vector form:
(2.7}
we have
V= (2.8}
n=O n=O
A Note on the "Contraction" (2.7). Note the critical role that (2.7}
played. If the n-step discounted transition probabilities did not go to zero,
2.2 Optimal Stopping Problems 47
we could not have obtained the uniqueness and possibly not even the in-
terpretation of the solution to (2.5) as the minimum cost vector. A similar
property is needed for the general control problem with a Markov chain
model and will be dealt with in Section 2.4 in more detail. Such properties
are also very useful in dealing with the convergence proofs for numerical
methods for solving equations such as (2.5). A proof of convergence in
a setting where the contraction property is absent is given in Subsection
15.3.3.
Remark. The above discussion showed only that the minimizing control in
(2.4) or (2.5) is the optimal control/stopping decision function with respect
to the class of comparison stopping rules which are also determined by first
entrance times into sets in the state space or, equivalently, by feedback
control laws, and that V(x) is the least cost only in this class. But it is
also true that the optimality of u(-) in this class implies optimality with
respect to all admissible stopping times. The proof is essentially the same
as used above for the pure Markov rules, except that the alternative u(~n)
are replaced by a sequence of appropriate "admissible" decision variables
{un} and the details are omitted [88, 138].
Alternative Conditions. Define eo= minx c(x), and suppose that eo> 0.
Then we need only consider stopping times N which satisfy
Thus, it would have been preferable to stop at the initial time rather than
at N. Hence, we can assume (2.9). This implies that V(x) is finite. Also,
Rn(u) ~ 0 for all pure Markov u(·) for which (2.9) holds. By this result
and the principle of optimality, V ( ·) satisfies
An obligatory stopping set can also be added and the comments made in
the paragraph just above Subsection 2.2.2 hold for this case too.
the control parameter a E U for each x, y. Now, suppose that f3(x, a) > 0
for each x, a. For an admissible control sequence u, define the cost
The modifications which are needed if we are obliged to stop when first
reaching a selected "boundary" set as will be stated below.
Let V(x) denote the infimum of the costs W(x, u) over all admissible
control sequences. V(x) is finite for each x due to the discounting and
satisfies the dynamic programming equation
{3.2)
Then, as in Section 2.2, (3.3) implies {where the inequality is for each
component)
V = R(u)V + C(u) ~ R(u)V + C(u), {3.4)
where u(x) is the minimizing control in {3.2) or in the xth line of {3.3) and
u( ·) is any other feedback control. As in the last section, the discounting
implies {2.7) and that {2.8) holds, from which follows both the uniqueness
of the solution to (3.3) and the fact that the solution is the minimum value
function over all feedback control alternatives. A similar proof shows that
u( ·) is optimal with respect to all admissible control sequences.
50 2. Controlled Markov Chains
W(x,u)
(3.5)
Then the dynamic programming equation is (3.2) for X fl. as, and with the
boundary condition V(x) = g(x),x E as. All of the previous results hold
here also, and a "reflecting boundary" as+ can be added as in Section 2.1.
Next, working as in the previous sections, we put (4.2) into vector form.
For a feedback control u(·), define the reduced transition matrix R(u) =
2.4 Control to a Target Set and Contraction Mappings 51
Note that the R(u) which appeared in (2.5) and (3.3) are contractions for
all feedback u(·). If R(u) is a contraction for each feedback control u(·),
then the arguments of the previous two sections can be repeated to show
that the solution to (4.2) or (4.4) is unique, that it is the optimal cost, and
that the minimizing feedback control is an optimal control with respect to
all admissible controls.
A4.1. There are eo > 0, such that c(x, a) ~ eo, and a feedback control uo(-)
such that R(Uo) is a contraction.
In this case, the positivity of the running cost rate c(·) implies that W(x, u)
might be unbounded for some controls. But the existence of uo(·) in (A4.1)
guarantees that there is a feedback control u(·) which is optimal with re-
spect to all admissible controls and such that R(u) is a contraction and that
(4.4) holds. The proofs of such assertions can be found in many places in the
literature on control problems with Markov chain models [11, 88, 126, 132].
It is important to recall that R( u), being a contraction, is equivalent
to E; N < oo for all x because the state space is finite, where N is de-
fined above (4.1). This later interpretation is useful because it can often
be checked by inspection of the transition matrices, for particular controls
u(·). In particular, for each x fj. 88 there needs to be a chain of states of
positive probability under u( ·) and which leads to some state in 88.
The concept of contraction plays a fundamental role in the proofs of
convergence of the numerical methods used in Chapter 6.
52 2. Controlled Markov Chains
(5.1)
where we use the notation E;:,n to denote the expectation given that en = X
and that the control sequence {un, .. .} is used. As in Subsection 2.1.4,
N = min{n: en E as}. Letting V(x,n) denote the infimum of W(x,n,u)
over all admissible controls and using the principle of optimality, we get
that the dynamic programming equation is
for xES -as, n < M. The boundary conditions are the same as in (1.21),
namely, W(x, n) = g(x, n) for X E as or n = M.
3
Dynamic Programming Equations
numerical schemes. With this in mind it makes little sense to maintain any
pretense regarding mathematical rigor in these derivations. The ''valida-
tion" for any derived equation will come in the form of a rigorous conver-
gence proof for numerical schemes suggested by this equation. Thus, the
formal derivations themselves are not used in any direct way. Our motiva-
tion for including them is to provide a guide to similar formal derivations
that might be useful for less standard or more novel stochastic control
problems. For a more rigorous development of the dynamic programming
equations we refer the reader to [56] and [58].
Because the process is stopped upon hitting 8G, we refer to the associated
boundary condition as absorbing. From a formal point of view, W ( ·) satisfies
the equation
.CW(x) + k(x) = 0, x E G0 , {1.4)
where .Cis the differential operator of {1.1). A formal derivation of {1.4) is
as follows. Under broad conditions we have Px{r ~ A}j.A -t 0 as A-t 0
for X E G0 . Suppose that W(·) is bounded and in C 2 (00). For A> 0 we
may write
where the second equality follows from the Markov property and the defi-
nition of W(·). It follows that
where
h(r,A) = W(x(A))- ft.. k(x(s))ds- g(x(r))
frAil
is bounded uniformly in w and A. Therefore, under the condition Px{ T ~
A} j A -t 0 as A -t 0, the right hand side of the last equation tends to zero
as~--+ 0. Applying Ito's formula to the left hand side of {1.5) and sending
A -t 0, we formally obtain (1.4).
The boundary conditions are not so obvious, since not all the points on
the boundary are necessarily reachable by the process. Only a few com-
ments will be made on this point. We define a regular point of the 8G to
be any point x E 8G such that, for all 0 > 0,
Thus, a regular point is a point such that if the process starts nearby, then
exit is virtually assured in an arbitrarily small time and in an arbitrarily
small neighborhood. The reader is referred to [83] for further discussion
of the distinction between regular points and those points which are not
I:
regular.
Now suppose that X E aG is regular. Then Ey k(x(s))ds --+ 0 as
y -t x. Furthermore, the continuity of g implies that Eyg(x(r)) -t g(x)
as y--+ x. Combining, we have W(y) --+ g(x) as y -t x. Thus, the correct
boundary condition is W(x) = g(x) for regular points x of 8G.
56 3. Dynamic Programming Equations
The Verification Theorem. Suppose that {1.4) holds for W(·) bounded
and smooth inside G, and that the probability that x(·) exits G 0 through
the set of regular points is unity for each initial condition x E G0 . Then by
Ito's formula, for t < oo we have
W(x) =Ex 1 00
e-.Bsk(x(s))ds. (1.6)
If W(x) is sufficiently smooth, then one can easily show it satisfies the
equation
.CW(x)- {3W(x) + k(x) = 0, x E IRk. (1.7)
The idea is essentially as follows. Following the argument of the previous
subsection, for ~ > 0 we write
or
Wx(x)'r(x) = 0 (1.9)
x and that at x+r(x)8 are the same for the approximation to the reflected
process, which suggests (1.9).
Part of the boundary can be absorbing and part reflecting. In order to
contain the discussion, we describe only the special case where there is an
outer boundary that is reflecting and an inner boundary that is absorbing.
Let G1 and G be compact sets, each the closure of its interior, and with
G1 C G0 • The process will be reflected instantaneously to the interior of G
on hitting its boundary, and it will be absorbed on hitting the boundary
of G1. Define T = inf{t : x(t) E Gl}, and consider the cost functional
(1.8). Then W(·) formally satisfies (1.7) but with the boundary condition
W(x) = g(x) on the boundary of G1 and Wx(x)'r(x) = 0 on the boundary
of G.
. Ex J~ k(x(s))ds
'Y = l lm (1.10)
t-too t
exists. One can formally derive a PDE which yields/, but we only discuss
a verification theorem. Suppose that there is a smooth function W ( ·) and
a constant 'Y which satisfy the equation
From a formal point of view, it can be shown that W(·) satisfies the PDE
8W(x t)
8t' + .CW(x, t) + k(x) = 0
for x E G 0 , t < T, together with W(x, T) = g(x, T). We also have W(y, t)--+
g(x, t) as y--+ X E oG for regular points X and
.CW(x) + k(x) = 0, x E G 0 ,
For the jump diffusion, we obviously need to specify more than simply
a boundary condition because since at the time of exit the process may
be far removed from the set G. The condition becomes W(x) = g(x) for
x¢00.
One can show that the pure Markov stopping times are admissible. (Ac-
tually, for the pure Markov stopping times to be admissible for all Borel
sets B requires an additional technical assumption on the filtration, namely,
right continuity [52]. In keeping with the spirit of this chapter we will not
worry about this issue.) Let k( ·) and g( ·) be bounded and continuous real
valued functions, with inf k(x) ~ ko > 0. For an admissible stopping time,
define the cost
and the optimal cost, where the infimum is taken over all admissible stop-
ping times:
V(x) = infW(x,r).
T
It is well known that the infimum is the same if it is taken over pure Markov
stopping times. Indeed, it is the optimal pure Markov stopping set that we
seek with the numerical methods. A further simplification, owing to the
positive lower bound on k, is that we need to consider only stopping times
whose mean value satisfies
E 2supz jg(x)l
z'T ::::; ko .
This follows because if the mean value is larger, we will do better to stop
at t = 0.
3.3 Control Until a Target Set Is Reached 61
where the set B is part of the solution. We will give a simple derivation.
The derivation assumes familiarity with the principle of optimality, which
is discussed in Chapter 2 for the optimal stopping problem. Suppose that
we restrict the times to be multiples of a small ~ > 0. Then at each
time n~, we have the choice of stopping or continuing. Given the current
state, the additional cost that we pay for immediate stopping is g(x(n~)).
Heuristically, the usual dynamic programming argument tells us that the
additional cost paid for continuing and using the optimal decisions in all
future steps is Ex(n~)[V(x((n + 1)~)) + ~k(x(n~))]. Thus, heuristically,
Recalling that for x ¢ B we have V(x) < g(x), we see that for x ¢ B
the minimum must be taken on by the second term. If we divide by ~. we
obtain
1
~ [Ex V(x(~))- V(x)] + k(x) = 0.
If we apply Ito's formula, use the assumption that V(·) is smooth, and then
send ~ to zero, the result follows. Depending on whether the boundary is
reflecting or absorbing, the appropriate boundary conditions are added.
where
tr [fxx(x)a(x)] = L fx;xi (x)aij(x).
ij
Define
V(x) = infW(x,u),
where the infimum is over the admissible controls.
We now apply a formal dynamic programming argument to derive the
PDE which is satisfied by the optimal value function V(·). Suppose that
V(·) is as smooth as necessary for the following calculations to be valid.
Suppose that there is an optimal control u(·) which is pure Markov. Let
fl. > 0, and let a be any value in U. Define u( ·) to be the control process that
uses the feedback control u( ·) for t ~ fl. and uses the control identically
equal to a for t < fl.. Define the process x( ·) to be the process which
corresponds to use of the control u(·). Let f denote the time that the
target set is reached under this composite control. Let x( ·) and T denote
the solution and escape time under the optimal control u(·). By definition,
we have
V(x) = E! [fo-r k(x(s),u(x(s)))ds + g(x(r))].
The optimality of V(·) implies
l
Therefore
(3.5)
where
h(r,~,u) = V(x(~)) -lt:. k(x(s),a)ds- g(x(i))
rl\t:.
is bounded uniformly in w and ~. If we assume (as in Subsection 3.1.1)
the condition P,:{r < ~}/~-+ 0 as~-+ 0, then the right hand side of
(3.5) tends to zero as ~ -+ 0. Therefore, taking this limit yields that, for
any value of a in U,
.cav(x) + k(x, a) ;:::: 0.
Suppose in the calculations above that a is replaced by u(x(s)) on [0, ~),
and that u(·) is continuous at x. Then the analogue of (3.5) holds with the
inequality replaced by an equality. We then formally obtain the equation
{ infaEU
[.C 0 V(x) + k(x, a)j = 0, X E G0
(3.6)
V(x) = g(x), X E aG.
It should also be noted that
for admissible u.
for all values of x E C 0 , t ~ 0, and w. Let r and f denote the escape time
under the controls u(·) and u(·), respectively. Then by Ito's formula we can
write
= E!
tl\7' k(x(s),u(x(s)))ds
lo
and
with the absorbing boundary condition on 8C and with the set B part of
the solution.
3.5 Average Cost Per Unit Time 65
J:
As another alteration, let the "local discount rate" depend on the state.
In particular, define A(t) = exp- ,B(x(s))ds for some bounded, contin-
uous, and nonnegative function .8(·). Let the cost be (4.1), with exp,Bt
replaced by A(t). Then the Bellman equation for the optimal cost is just
(4.2), with the .B replaced by ,B(x).
Let the cost be (4.1), and let the process be the reflected form of ( 1.1),
with reflection set 8G and reflection direction r(x) as in Section 1.3. Then
the Bellman equation is
Let the infimum be taken on by a feedback function u(·) under which the
reflected diffusion is well defined. Then one can show that u( ·) is optimal
with respect to any admissible control u( ·) for which
E;V(x(t))jt--+ 0
as t --+ oo. The proof of a verification theorem for this problem combines
the ideas of Subsection 3.1.4 and Section 3.3.
4
The Markov Chain Approximation
Method: Introduction
The main purpose of the book is the development of numerical methods for
the solution of control or optimal control problems, or for the computation
of functionals of the stochastic processes of interest, of the type described
in Chapters 3, 7-9, and 12-15. It was shown in Chapter 3 that the cost or
optimal cost functionals can be the (at least formal) solutions to certain
nonlinear partial differential equations. It is tempting to try to solve for
or approximate the various cost functions and optimal controls by deal-
ing directly with the appropriate PDE's, and numerically approximating
their solutions. A basic impediment is that the PDE's often have only a
formal meaning, and standard methods of numerical analysis might not be
usable to prove convergence of the numerical methods. For many problems
of interest, one cannot even write down a partial differential equation. The
Bellman equation might be replaced by a system of ''variational inequali-
ties," or the proper form might not be known. Optimal stochastic control
problems occur in an enormous variety of forms. As time goes on, we learn
more about the analytical methods which can be used to describe and an-
alyze the various optimal cost functions, but even then it seems that many
important classes of problems are still not covered and new models appear
which need even further analysis. The optimal stochastic control or stochas-
tic modeling problem usually starts with a physical model, which guides
the formulation of the precise stochastic process model to be used in the
analysis. One would like numerical methods which are able to conveniently
exploit the intuition contained in the physical model.
The general methods developed in this book can be applied to a very
broad class of stochastic and deterministic control problems, as well as to
68 4. Markov Chain Approximation Method
Let G be a compact set which is the closure of its interior G0 . For {3 > 0,
we consider the discounted cost
where r = inf{t: x(t) (j. G0 }, the first escape time of x(·) from G0 . Define
V(x) = infW(x,u),
1.1
where the infimum is over all admissible controls. Recall (Section 1.3) that
an admissible control u( ·) is a measurable process which is nonanticipative
with respect tow(·), and u(t) takes values in U, a compact set. The devel-
opment will be entirely formal, because we are concerned with motivation
only. But in order to be certain that we are on solid ground, let us sup-
pose here that the diffusion is well defined for any admissible control. In
particular, suppose that:
Al.l. b(·) and o{) are bounded, continuous, and Lipschitz continuous in
x, uniformly in u. Both k(·) and g(·) are bounded and continuous.
a(x) = a(x)a'(x),
SUPn,w !e~+l- e~j ~ 0.
Note that the chain has the "local properties" of the diffusion process (1.1)
in the following sense. By Section 1.3, letting x(O) = x and u(t) =a on the
interval [0,8] in (1.1) gives us
(1.5)
Let E'// denote the expectation, given that e~ = X and that either an ad-
missible control sequence uh = {u~, n < oo} or a feedback control denoted
by uh (·) is used, according to the case.
72 4. Markov Chain Approximation Method
(2.1)
eg ~~
ef
t~ t~ t~
t
~ta ~t}
Figure 4.1. Construction of the interpolation ~h(·).
wish (see Section 5.2), but we might find that restrictive. For example, if
the local velocity b( ·) is large at some value of x, then we might want to
use a smaller interpolation interval there. Also, the numerical procedures
converge faster when we take advantage of the added flexibility allowed by
the variable intervals. The interpolated process eh (·) is piecewise constant.
Given the value of the current state and control action, the current interval
is known. The interpolation intervals are obtained automatically when the
transition functions ph(x, yla) are constructed. See Chapter 5 and Sections
4.4 and 4.5 below.
Let N h denote the first time that {e~, n < oo} leaves G~. Then, the
first exit time of eh (.) from G0 is Th = tt .
There are several natural cost
functionals for the chain which approximate (1.2), depending on how time
is discounted on the intervals of constancy [t~, t~+ 1 ). If the discounting is
constant on this interval then we can use the approximation
Nh-1
Wf(x, uh) = E;h L e-11t~k(e~, u~)~t~ + E;h e-i1rhg(e'JvJ. (2.2)
n=O
We have
IVhx)- V2h(x)l ~ o.
The dynamic programming equation (2.3.2) for cost function (2.2) is
g(x),
74 4. Markov Chain Approximation Method
The dynamic programming equation for V2h(x) is the same, except that the
coefficient Llth(x, a) of k(x, a) is replaced by
rt;.th(x,o.)
Jo e-{3sds = [1 - e-.Bt!.th(x,o.) J / fJ = Llth(x, a) + 0( (Llth(x, a) )2 ).
(2.5)
A third possibility for the approximation of the discount factor appears in
(3. 7), and is based on the continuous parameter Markov chain interpolation.
The difference between the solutions of (2.4) and (3.7) goes to zero ash--+ 0.
Discussion. The similarity of the cost functions (2.2) and (2.3) to (1.2) and
the similarity of the local properties of the interpolation ~h ( ·) to those of
the original controlled diffusion x(·) suggest that the ~h(x) might be good
approximations to V(x) for small values of h. This turns out to be true.
Any sequence ~h(·) has a subsequence which converges in an appropriate
sense to a controlled diffusion of the type (1.1). This will be dealt with in
Chapters 9 and 10. Suppose that uh(x) is the optimal control for the chain
{~~' n < oo} with cost function (say) (2.2), and suppose that the associated
sequence ~h(·) converges to a limit diffusion x(·) with admissible control
u( ·). Under quite broad conditions, the sequence Th of times that the chains
first exit cg will also converge to the time that the limit process x( ·) first
exits G0 • If this is the case, then the cost functionals ~h(x) for the sequence
of chains will converge to the cost functional W(x, u) for the limit process.
Because V(x) is the optimal value function, we have that W(x, u) ~ V(x)
and, hence, liminfh \Jih(x) 2:': V(x). The reverse inequality will be proved
by another approximation procedure, which uses the optimality of the cost
functionals ~h(x) for the controlled chain. For the mathematical proof of
the convergence, we might need to extend the class of allowed controls to
a class of so-called "relaxed controls," but the infimum of the cost function
over the original class of controls and that over the new class of controls
are equal. A good example of the entire procedure is given in Sections 4.5
and 4.6 below for a simple deterministic problem.
The Markov chain approximation method is thus quite straightforward:
(a) get a locally consistent chain; (b) get a suitable approximation to the
original cost function for the chain.
state at the start of the n-th interval are known, the length of the interval
is known. It is sometimes more convenient for the proofs of convergence
to use a continuous parameter interpolation of {e~' n < 00} which is a
Markov process itself. We now construct such an interpolation, which will
be denoted by 1/Jh ( ·). Define T~ = 0, let {T~, n < oo} denote the moments
of change of 1/Jh (·), and set Ll. r~ = r::+l - r::. Define 1/Jh (·) at the r:: by
For d > 0, define the increment fl.¢h(t) = 1/Jh(t +d) - 1/Jh(t). The local
properties of 1/Jh(·) follow from (1.3) and the above definitions and are
= E;" 1 7
" e-f3tk('¢h(t),uh(t))dt + E;" e-f3r"g('tf}(rh)), (3.6)
Define Vh(x) = infu Wh(x, u), where the infimum is over all admissible
control sequences. Then the dynamic programming equation for the con-
trolled chain {e~, n < oo} and cost (3.6) is
l
Vh(x) =
. [ 1 "" h h t:J.th(x, a)
~ll} 1+,Bt:J.th(x,a)7P (x,yia)V (y)+k(x,a)1+,8t:J.th(x,a) '
(3.7)
for X E ag and with the boundary condition Vh(x) = g(x), X ri ag. We
have !Vh(x)- Vhx)l ~ 0, and any of the above dynamic programming
equations can be used with the same asymptotic results. We will return to
this equation in connection with the example in Section 4.5.
processes 1/Jh(·) and x(·). Let us define the following "limit of conditional
expectations." By (3.4) and the definition of bh(·) in (1.3),
Now, factoring out this conditional mean rate of change of 1/Jh(-) or "com-
pensator," as it is commonly called, and letting x(O) = 1/Jh(O) = x, we can
write [the expression defines Mh(·)]
The jumps of Mh(-) are those of 1/Jh(-) and, hence go to zero as h ---+
0. Between the jumps, the process is linear in t. The process Mh (·) is a
martingale whose quadratic variation is J~ ah('I/Jh(s),uh(s))ds, where ah(·)
is defined by (1.3) and also equals
where we require that lu(t)l ~ 1. Let G be the interval [0, B], where B > 0
is supposed to be an integer multiple of h. Let h ~ 1, and define the
transition probabilities for the approximating Markov chain as:
h 1±ha
p (x, x ± hia) = - 2- .
It is easily verified that the chain is locally consistent with x( ·) in the sense
of (1.3). The interpolation interval is just h2 • The dynamic programming
equation for the cost function Wf(x, u) of (2.2) is given by (2.4); namely,
Vh(x)
. 1 [1 + ah h 1- ah h ]
= ~Jl} 1 + ,8h 2 - 2- V (x +h)+ - 2- V (x- h)+ k(x, a)h 2 .
(4.4)
Note that ( 4.4) is the dynamic programming equation for a discounted cost
problem for a controlled random walk with ~th(x, a) = h and 2 discount
factor 1/(1 + ,B~th(x,a)). It is just (3.7) and is an O(h 4 ) approximation
to (4.2). The consistency of the equation (4.4), obtained by using the finite
difference method with the dynamic programming equation obtained for
the process '1/Jh(-) in the last section or with (4.2), suggests that the control
problem for the Markov chain might actually be a good approximation to
that for the original problem. Appropriate finite difference approximations
can often be used to obtain approximating Markov chains, as will be seen in
Chapter 5. But we emphasize that from the point of view of the convergence
proofs it is the controlled process which is approximated and not the formal
PDE (4.3).
W(x,u) = 1 00
e-f3tk(x(t),u(t))dt, (3 > 0. (5.lb)
where u~ is the actual control which is used at the n-th update. Define
the sequence uh = {u~, n < oo} and the interpolation interval ~t~ =
ath( e~, u~). Define t~ = E~:o1 a if. Define the continuous/ parameter pro-
cess eh(-) by eh(t) = e~, t E [t~, t~+ 1 ). It is the eh(-) which approximates
the solution of (5.1a). A reasonable approximation to the cost function
(5.1b) is
L e-.Bt~k(e~, u~)~t~.
00
Wh(x, uh) =
n=O
The dynamic programming equation for the optimal cost is
[
ifh(x) = ~Jl) e-.8~th(x,a)ifh(x + b(x, a)ath(x, a)) + k(x, a)ath(x, a)] .
(5.2)
time ~th(x, a). Refer to the figure, where the values are plotted for two
values of a. Let Yh(x, a) denote the corners of the triangle in which z(x, a)
falls. For example, in the figure Yh(x,a2) = {x,Yl>Y2}· We can represent
the point z(x,a) as a convex combination of the points in yh(x,a). Let
ph(x, yla) denote the weights used for the convexification. These weights
are nonnegative and sum to unity. Hence, they can be considered to be
transition probabilities for a controlled Markov chain whose state space is
just the set of all corner points in the figure.
Write A-E~ = E~+ 1 - E~ in the form A-E~ = E~ A-E~ + (A-E~ - E~ A-E~). Then
We will next show that only the "mean increments" (the first sum on the
right) of EhO are important and that there is some control such that Eh(·)
is a good approximation to a solution to (5.1a) under that control.
The right hand sum in (5.5) is a continuous time interpolation of a
martingale. By (5.4b), its variance is E'Lm:th <tAt~O(h) = O(h)t. By
n+l-
(1.1.3), this implies that for any t < oo
2
Thus the effects of that right hand term in (5.5) disappear in the limit. The
basic reason for this is that the spatial and the temporal scales are both of
the order of h.
We write the right hand sum in (5.5) simply as O(h), and this is the
order of that term for approximations of deterministic problems in general.
Define the continuous parameter interpolation uh(·) by uh(t) = u~ on the
interval [t~+l, t~). Now, using (5.4a), we have
1t
n:t~+ 1 9
(5.6)
X+ b(Eh(s), uh(s))ds + O(h).
We now proceed to show that there is some admissible control such that
the paths of eho are actually good approximations to a solution of (5.la)
under that control. Because A.E~ = O(h) and At~ ~ k 1 h, the (piecewise
linear interpolations of the) paths of the process ~h(·) are equicontinuous
(in w and h). Thus, for each fixed value of the probability space variable
w, each subsequence of {Eh(·)} has a further subsequence which converges
to some limit uniformly on each bounded time interval. Suppose that the
same were true of the sequence of interpolated control paths uh(·). We
84 4. Markov Chain Approximation Method
note now, for future reference, that this is hard to guarantee. ThiS will be
the reason for the introduction of an expanded class of admissible controls
in Section 4.6 below. However, until further notice, we do proceed under
the assumption that the paths of the control processes are equicontinuous
in hand w. Even if this assumption is not true, the convergence result
for Vh(x) will remain true. The details for the general case will be given
in Section 4.6. Continuing, fix the sample space variable w and let hn(w)
index a convergent subsequence of (the piecewise linear interpolations of)
{eh(·),uh(·)}, with limit denoted by x(·,w),u(·,w). By the convergence
and the compactness of U, we have u(t,w) E U. The uniform convergence
implies that
x(t,w) = x+ 1t b(x(s,w),u(s,w))ds. (5.7)
Thus, the limit path satisfies the original ODE (5.la) with an admissible
control. Also, it is easily seen that Vhn(w)(x) converges to W(x, u(w)). Due
to the minimality of V(x), we have
except possibly at the points itS. Fix w and choose a convergent subsequence
of eh,E (.) (the sequence need not be the same for each w). Let xE (.' w) denote
the limit. Then, following the analysis which led to (5.7), we get that
The limit paths xE(·,w) are all the same, irrespective of the chosen subse-
quence or of w, because the solution to (5.1a) is unique under the chosen
control uE(·). This implies that the sequence Wh(x,uh,E) converges to the
~:-optimal cost W(x, uE). Now, using the optimality of Vh(x), we have
infW(x,u)
u
= V(x) = W(x,u),
where the infimum is over all the U-valued measurable functions. Our
primary concern is with getting a good approximation to V (·) and with
feedback controls which yield costs which are close to the infimum. The
numerical methods will give feedback controls, but in order to be able to
prove that the values of the costs which are given by the numerical algo-
rithms converge to the infimum of the costs as the approximation parameter
h converges to zero, we need to know that there is an optimal control in
some appropriate sense. In particular, as seen in the argument at the end
86 4. Markov Chain Approximation Method
of the last section, we will need to know that there is a reasonable class
of admissible controls, and an optimal control in that class which can be
approximated by a ''nice" piecewise constant control, with arbitrarily small
penalty. The class of relaxed controls to be defined below was introduced
for just such a purpose [10, 152]. In particular,
(6.1)
Note that V(O) = 0. To see this, define the sequence of controls un(-) by
mt(A) IA(u(t)),
(6.2)
m(A x [0, t]) J~ m 8 (A)ds.
We can now write (5.1) as
or as
m(A x [0, t]) is just the total integrated time over the interval [0, t] that
the control u( ·) takes values in the set A C U. A relaxed control is just a
generalization of such m( ·) and we now give the general definition.
<5-tO u
x0
= 1
-[b(x,a1)
2
+ b(x,a2)] = 1
u
b(x,a)mt(da). (6.4)
With the use of a relaxed control, the set of possible velocities and cost
rates (b(x,U),k(x,U)) is replaced by its convex hull. The relaxed control
m(·) for which W(O, m) = 0 and x(t) = 0 in (6.1) is the one for which mt(·)
is concentrated on the points ±1 each with mass 1/2 for all t. •
L L
t < oo,
1t ¢(a, s)mn(dads)-+ 1t ¢(a, s)m(dads), (6.5)
where m(·) is the relaxed control with derivative mt(-1) = mt(1) = 1/2.
If xn(-) is the solution to (5.1a) under control un(-), then xn(-) converges
88 4. Markov Chain Approximation Method
to x(·) which satisfies (6.4) with a 1 = 1 and a 2 = -1. In fact, for this
example, mn([O, t] x {+1}) is just the total part of the time interval [0, t]
on which un(s) = 1, and it equals t/2 + 0(1/n).
For any sequence of relaxed admissible controls, there is always a subse-
quence which converges in the sense of (6.5). The introduction of relaxed
controls has the effect of making the control appear essentially linearly in
the dynamics and cost function.
We saw in an example above that there might not be an optimal control
in the class of ordinary admissible controls. But there always is one in the
class of relaxed controls. More detail is in Chapters 9 and 10. The following
approximation result is important in applications because it says that any
admissible relaxed control can be well approximated by a "nice" ordinary
admissible control.
financial mathematics.
first exit timeT= min{t: x(t) ¢. (O,B)} and the cost functional
W(O) = W(B) = 0,
where .C = (a 2 /2){d2 fdx 2 ) is the differential operator of the process x(·).
We will obtain the desired transition probabilities and interpolation in-
terval simply by trying to solve the differential equation {1.2) by finite
differences. The standard approximation
for the second derivative will be used. Now use {1.3) in {1.2), denote the
result by Wh(x), and get (for x EGg)
x E Gg. Using N h for the first escape time of the chain from the set Gg,
the solution of {1.5) can be written as {see Chapter 2) a functional of the
path of the chain in the following way:
Nh-l
Wh(x) =Ex L k(e~)l:l.th. {1.6)
n=O
5.1 One Dimensional Examples 93
W(x) = k(x(s))ds. Formally, W(·) solves the ODE {1.2), where now
C = b(x)djdx is just a first order differential operator. In particular, if
W ( ·) is smooth enough we have
Wx(x)b(x) + k(x) = 0, x E (0, B),
(1.7)
W(O) = W(B) = 0.
Owing to the unique direction of flow for each initial condition, only one
of the two boundary conditions is relevant for all x E (0, B). As was done
with Example 1, the Markov chain approximation to x(·) will be obtained
by use of a finite difference approximation to the derivative in (1.7). But we
will need to choose the difference approximation to the derivative Wx(x)
carefully, if the finite difference equation is to have an interpretation in
terms of a Markov chain. Define the one sided difference approximations:
f(x +h) - f(x)
fx(x) -+ h if b(x) ~ 0,
(1.8)
f(x)- f(x- h)
fx(x) -+ if b(x) < 0.
h
That is, if the velocity at a point is nonnegative, then use the forward
difference, and if the velocity at a point is negative, then use the backward
difference.
Such schemes are known as the "upwind" approximation method in nu-
merical analysis. Define the positive and negative parts of a real number
94 5. The Approximating Markov Chains
h b+(x) h b-(x) h
= W (x +h) lb(x)l +W (x- h) Jb(x)l + k(x) lb(x)l
= Wh(x + h)ph(x, x +h)+ Wh(x- h)ph(x, x- h) (1.10)
+ k(x)~th(x).
The ph(x,y) define a Markov chain {e~, n < oo} on Sh. If infx Jb(x)J #
0, then the chain together with the interpolation interval hflb(x)l is lo-
cally consistent with the "process" defined by the solution to :i; = b(x) in
that (4.1.3) holds. In particular, E~,n~e~ = b(x)~th(x) and cov~,n~e~ =
O(h2 ) = o(~th(x)).
h h h b+(x) h
= p (x, x)W (x) + (1- p (x, x)) Jb(x)l W (x +h)
h b-(x) h h h
+ (1- p (x,x)) Jb(x)i W (x- h)+ (1- p (x,x))~t (x)k(x).
{1.10')
Define the new transition probabilities
where Nh is again the first escape time of the chain from G~. If xo E {0, B),
then the process might get stuck at xo with value Wh(xo) = 0 if k(xo) = 0,
and Wh(x 0 ) = ±oo otherwise, according to the sign of k(x0 ). It turns out
that Wh(x)---+ W(x), and the interpolated processes eh(·) and 1/lh(·) both
converge to the solution to x = b( x). Thus, the finite difference method
automatically gives a chain which satisfies the consistency conditions and
can be used to approximate the original process as well as functionals of it.
inf(a 2 (x)
X
+ lb(x)l) > 0.
If this last restriction does not hold, then we can continue by adding tran-
sitions from appropriate states x to themselves as discussed in Example
2. Let W(x) be defined as in Example 1. Then if W(·) is smooth enough,
Ito's formula implies (1.2), where Cis the differential operator of the pro-
cess (1.11). In particular,
W(O) = W(B) = 0.
Use the finite difference approximations (1.3) and (1.8), and again let
Wh(x) denote the finite difference approximation. [A possible alternative
to (1.8) is given in (1.18) below.] Substituting these approximations into
(1.12), collecting terms, multiplying by h 2, and dividing all terms by the
coefficient of Wh(x) yields the approximating equation
for the first derivative in lieu of (1.8). Then repeating the procedure which
led to {1.14) yields the following finite difference equation and new transi-
tion probabilities and interpolation interval.
Wh(x)
= u 2(x) + hb(x) Wh( h) u 2(x)- hb(x) Wh( _h) k( )~
2u2 (x) x + + 2u 2 (x) x + x u2(x)
= ph(x,x + h)Wh(x +h)+ ph(x,x- h)Wh(x- h)+ k(x)~th(x).
(1.19)
Local consistency can be shown as in (1.15) and (1.16).
If (1.17) does not hold, then the ph(x,x ±h) in (1.18) are not all non-
negative. Thus, they cannot serve as transition probabilities and, in fact,
serious instability problems might arise.
Again, let 7 = min{t: x(t) ¢ (0, B)} and define the cost function
h u 2(x)f2 + hb+(x,a)
p (x,x+hia)= u2(x)+hib(x,a)i'
For y -=f. x ± h, set ph(x, yja) = 0. Then the constructed ph are transition
probabilities for a controlled Markov chain. Local consistency of this chain
and interpolation interval can be shown exactly as for Example 3. Also, by
following the procedure which led to (1.13), we see that the formal finite
difference form of (1.21) is just
Nh-1
wh(x,u) = E~ I: k(e:,u(e:nath(e:,t~-(e:)) + E~g(et).
0
( l
v• (x) ~ :!'.ill [~>· x, yia )V•(y) + k( x, ")t.t• (x, ") (1.24)
h2
~fh(x) = u2(x) + hB(x) ·
The ph(x, yia) might sum (over y) to less than unity for some x and a. To
100 5. The Approximating Markov Chains
It can be readily shown that the chain associated with the transition prob-
abilities ph(x, yia) and interpolation interval ~fh(x) is locally consistent
with {1.20). The difference between the barred and the unbarred values
is O{h). The symmetric finite difference approximation (1.18) to the first
derivative can also be used if
It will be seen below that the use of (2.1) and (2.2) is equivalent to the use
of {1.22). They yield the same cost functions and minimum cost.
the convergence. Suppose that the -ph(x, yia) defined in (2.1) and (2.2) are
used for the ph(x, yia) in (1.23) or (1.24). We have ph(x, xi a) = O(h) i= 0.
The difference from zero is not large, but numerical studies suggest that
even in this case there is an advantage to eliminating the transition of a
state to itself. When approximating deterministic optimal control problems
the improvement can be very substantial (see Section 15.4). Let us see how
this can be done.
-h Ly¥:z:Ph (x,yiu(x))W
-h
(y,u) + k(x,u(x))~t~ (x,u(x))
W (x,u)= h '
1- p (x,xiu(x))
or, equivalently,
Comparing (2.4) to (1.23), we see that Wh(x, u) = Wh(x, u) for all feedback
controls u(·) for which (1.23) or (2.4) has a unique solution. Thus one can
use either equation with the same results. This procedure for eliminating
the transitions from any state to itself is called normalization.
These observations apply to any situation where we start with a tran-
sition probability under which some states communicate to themselves.
Just to complete the picture, suppose that we are given a locally consistent
chain with transition probabilities and interpolation interval ph(x, yia) and
~fh(x, a), respectively, where ph(x,xla) might be positive for some x and
a. Let Wh(x, u) denote the associated value of the cost. Then
ph(x, yla)
1- jjh(x,xla)'
,.,_ (2.5)
~t (x,a)
1- jjh(x,xia)'
102 5. The Approximating Markov Chains
and set ph(x, xia) = 0. The chain with this new transition probability and
interpolation interval obviously yields the same cost as does the original
chain [under the same control u( ·)] if the solution to (2.3') is unique, because
by construction of the ph(x, yia) and L).t'(x), the equations for the cost are
the same in both cases. It is readily verified that (2.5) yields a locally
consistent chain.
or
._+I (z) ~ arg~ [~)"(x, yla)W'(y, ._) +k(x, a)ai"(x, a)] (2. 7b)
where the functions are appropriately bounded and smooth. Similar consid-
erations work for the general r-dimensional problem. First try the central
difference based rule:
If this can't be used, then try a compromise between the central difference
and some one sided form; e.g.,
u 2 (x)
L0 f(x) = fx(x)b(x) + fx(x)a + - 2-fxx(x). (2.9)
Each of the two terms involving the first derivative will be approximated
separately. We will use the difference approximation
f(x+h)-f(x)
fora~O
{ a h
(2.10b)
fx(x)a---+ J(x) - f(x- h)
a h for a< 0.
Following the procedure of Example 4, but using (2.10) for the first deriva-
tive terms, yields the expression (1.23) but with the following new defini-
tions of the transition probabilities and interpolation interval: Define the
normalization Qh(x, a) = u 2(x) + hlal + hlb(x)l and
The chain defined by (2.11) is locally consistent with the controlled process
defined by (2.8).
The transition probabilities and interpolation interval in (2.11) differ
from (1.22) by having the absolute values of a and b(x) separated in the
denominators and (a + b( x)) ± being replaced by a± + b( x )± in the numer-
ators, an advantage from the coding point of view. The denominators still
have an a-dependence. This can be eliminated via the method which led to
(2.1), (2.2). To do this, define the maximum of Qh(x, a) over a E U = [0, 1),
namely, Qh(x) = u 2 (x) + h + hlb(x)l. Then use
The chain defined by (2.12) is locally consistent with the controlled diffusion
(2.8).
Solving for the Minimum in (2. 7). The form of the transition proba-
bilities given in (2.12) is convenient for getting the minimum in expressions
such as (2.7b), and we will go through the calculation for our special case.
The expression (2.7b) is equivalent to:
then we can use Un+ 1 (x) = 1 for the minimizing control value. In general,
we can conclude that
symmetric finite difference for one of the first derivative terms, as follows.
Suppose that a 2 (x)- h ~ 0. Then use the one sided finite difference for
the representation of Wx(x, u)b(x) and the symmetric finite difference for
the approximation of Wx(x, u)a at x. With the use of this approximation,
we still get a Markov chain which is locally consistent with {2.8), and the
calculation of the infima in {2. 7) is still relatively simple. The transition
probabilities are given by
ph(x,x ± hia) = [a 2 (x)/2 + hb(x)± ± ha/2]/[a 2 (x) + hlb(x)l].
The other ph(x, yia) are zero, and
Let ei denote the unit vector in the i-th coordinate direction and let lR'h
denote the uniform h-grid on IRr; i.e., lR'h = {x: x = h L:i eimi : mi =
0, ±1, ±2, ... }. Until further notice, we use Sh = lR'h as the state space of
the approximating Markov chain. It is not necessary to use such a uniform
grid but it simplifies the explanation.
Recall the procedure used for the scalar cases in Sections 5.1 and 5.2.
Our aim in this and in the following sections is just to get an approximat-
ing Markov chain which is locally consistent with {3.1). The procedure of
Section 5.1 started with a partial differential equation of the type {3.3):
.cu(x)W(x, u) + k(x, u(x)) = 0, {3.3)
5.3 The General Finite Difference Method 107
where the function k(·) played only an auxiliary role. Then suitable finite
difference approximations for the derivatives at the point x were substi-
tuted into (3.3), terms collected, and all terms divided by the coefficient of
Wh(x,u). We use Wh(x,u) to denote the finite difference approximation
to (3.3). This procedure gave the transition probabilities and interpolation
interval automatically, as the coefficients in the resulting finite difference
equation. We follow the same procedure and show that it works, together
with all of the variants discussed in Sections 5.1 and 5.2.
The finite difference approximations used in this section are simple. E.g.,
only derivatives along the coordinate directions are taken, and the grid is
uniform. Other approximations might be useful in particular situations and
some further comments appear below. The method is a special case of the
approach in the next section.
The Diagonal Case. For ease of explanation, we first write the expressions
for the case where aij(x) = 0 fori -:f. j. Proceed as in Section 5.1, Example
4. As in that example, the boundary conditions and boundary cost play no
role in getting the chain, and we ignore them. For the second derivative,
we use the standard approximation
(3.4)
For the approximations to the first derivatives, we can adapt any of the
previously discussed schemes. Let us start with the use of one sided ap-
proximations analogous to (1.8), namely,
Note that the direction (forward or backward) of the finite difference ap-
proximation (3.5) is the direction of the velocity component bi(x,a). Let
X E Sh.
Use (3.4) and (3.5) in (3.3), denote the solution by Wh(x, u), collect
terms, clear the fraction by multiplying all terms by h2 , then divide all
terms by the coefficient of Wh(x,u) to get the finite difference equation
where
ph(x, x ± eihJa)
~th(x,a) (3.7)
Qh(x,a)
108 5. The Approximating Markov Chains
and suppose that ath(x, a) = O(h). For y E sh not of the form X± hei for
some i, set ph(x, yla) = 0.
The expressions (3.7) reduce to (1.22) when x(t) is real valued. The
constructed ph(x, yla) are nonnegative. For each x and a, they sum (over
y) to unity. Thus, they are transition probabilities for a controlled Markov
chain. The local consistency of (3.7) with x(·) is easy to check. In particular,
Thus, for the case where the a(·) matrix is diagonal, the finite difference
method easily yields a locally consistent set of transition probabilities. All
the variations discussed in Section 5.2 can be used, and will be illustrated
in a two dimensional example in Subsection 5.3.2 below and at the end of
the next section.
fx;x; (x) --+ [2/(x) + f(x + eih + ejh) + f(x- eih- ejh)]/2h2
- [f(x + eih) + f(x- eih) + f(x + eih)
+ f(x- eih)]/2h 2 •
(3.10)
If aii(x) < 0, we will use
for all i,x. The condition (3.12) depends on the coordinate system which is
used, and several ways of relaxing it will be discussed later in this section
and in the next section. Define the normalizing coefficient
(3.14)
to check. For example, letting ~~,i denote the i-th component of the vector
~~.we have fori i j,
Eh,o.t::,.ch,it::,.ch,j = 2h2 aij(x) = a··(x)t::.th(x a)
x,n '>n '>n 2Qh(x, a) lJ ' •
The different choices for the finite difference approximation to the mixed
second order derivatives are made in order to guarantee that the coefficients
of the off-diagonal terms ph(x, x+eih±ejhia), ph(x, x-eih±ejhia), i i j,
are nonnegative. Also these choices guarantee that the coefficients sum to
unity, so that they can be considered to be transition probabilities for an
approximating Markov chain.
but with hiei replacing the hei. Let c = h2/h1. Then one can show that
the resulting coefficients are the transition probabilities and interpolation
interval for a Markov chain which is locally consistent with x( ·) provided
that
au - a12/c > 0, a22 - ca12 > 0. {3.16)
As shown in (94, Section 5), any value of c satisfying 2 < c < 2.5 will yield
a locally consistent chain. In the next section, it will be shown how to relax
{3.12) further with the use of transitions to nonneighboring states.
{3.17)
The values of the transition probability which were just defined might sum
(over y) to less than unity for some values of x and a. To compensate for
this, we allow each state x to communicate with itself with the probability
The new transition probabilities and interpolation interval are also locally
consistent with (3.1).
b1(x)dt,
(3.20)
b2(x)dt + u(x)dt + u(x)dw.
Writing the operator CC11 defined by (3.2) in the split up form suggested by
(2.9), we can write (3.3) as
Let the control space be U = [-1, 1]. Now use (3.4) for the second deriva-
tive Wz: 2 x 2 (x, u), and use the one sided approximation (3.5) for Wx 1 (x, u).
Approximate each of the Wx 2 (x,u) terms separately. For each of these
Wx 2 (x,u) terms, use either the one sided or the two sided (symmetric)
difference approximation, depending on the magnitudes of the coefficients.
In particular, let u 2 (x) ~ 1 for all x, and use the one sided approxima-
tion for Wx 2 (x)b2(x). Then the two sided approximation can be used for
Wx 2 (x,u)a and we get for h ~ 1
2 -± ±
h( h[ ) _a (x)f2+hb 2 (x)+ha
p x,x ± e2 a - Qh(x,a) , (3.22)
h2
~th(x) = Qh(x,a)'
Qh(x, a)= a 2 (x) + h[b 1 (x)[ + h[b2(x)[ + h[a[.
Indeed, one should not rely too heavily on direct finite difference type meth-
ods, unless clearly convenient, without understanding their relationship to
the approach of this section.
and
~)y- x)(y- x}'p~(x,yia) = o(Llt~(x,a}},
y
(4.4)
L)Y- x)p~(x, yia) = b(x, a)Llt~(x, a)+ o(Llt~(x,a)).
y
5.4 A Direct Construction 115
We next show how to combine (4.3) and (4.4) to get the desired transi-
tion probabilities and interpolation intervals ph(x, yia), ~th(x, a) in (3.7).
This is done via a "coin toss." Choose system (4.2a) [i.e., the transition
probabilities p~(x, yia) giving (4.3)] with probability ph(alx, a), and the
system (4.2b) [i.e., the transition probabilities giving (4.4)] with probabil-
ity ph(blx, a) = 1- ph(alx, a), where ph(alx, a) is to be determined. The
transition probabilities determined by the coin toss will be the ph(x, yia).
Dropping the o(·) terms, local consistency with (3.1) requires that
(4.5a)
y (4.5b)
= b(x, a)~th(x, a).
Equations (4.5) imply that
P h( I )_ ~t~(x, a)
ax, a - u.tb
A h(
x, a ) + uta
A h(
x, a ) .
This implies that
To gain confidence in the formulas (4.6) to (4.9), let us see how (3.7) can
be recovered. Define
Q~(x,a) = I>ii(x),
i
and use the locally consistent [with (4.2a) and (4.2b), respectively] transi-
tion probabilities and interpolation intervals
h aii(x)
Pa(x,x±heila) = 2 Q~(x,a)'
(4.10)
h I bf(x,a)
pdx,x ± hei a)= Qh( ),
b x,a
and
h h2
Llt (x, a) = Q~(x, a)+ hQNx, a)'
which are precisely the forms given by (3. 7) or, equivalently, by (4.8) and
(4.9). The general case follows the same lines.
It is generally the case that the individual interpolation intervals have
the representations in (4.10), for appropriate Qf(x, a). These representa-
tions imply the form of the interpolation interval (4.9) (where the drift
component is weighed by h in calculating the normalization), and the form
of the transition probabilities in (4.8).
Two special cases which illustrate variations of this procedure will be
described in the next two subsections.
5.4 A Direct Construction 117
The degenerate structure of the noise covariance matrix suggests that the
part of the transitions of any approximating Markov chain which approxi-
mates the effects of the "noise" would move the chain in the directions ±v.
We next pursue the decomposition approach of Example 1. Let the state
space Sh be that indicated by the extension of the grid in Figure 5.1b.
The "boxes" are such that the diagonals are in the directions ±v and of
magnitude vh. Given h > 0, the grid has spacing h in the e 1 direction and
f32h in the e2 direction.
One set of transition probabilities for a locally consistent chain for the
component of (4.11} which is represented by
dx = qvdw (4.12}
Thus, P!(x, yla) together with the interpolation interval A.t!(x, a) = h2jq2,
is locally consistent with (4.12}. Hence Q!(x, a)= q2 and na(x, x±vhla) =
q2/2
One possibility for the transition probability for the approximation to
(4.2b} is
p~(x,x ± .Bieihla) = bf~,a) x normalization, (4.13)
~(y-x)p~(x,yla)
y
h[ :~~~: ~~] x normalization
h
b(x,a)lb1(x,a)/.81l + l~(x,a)/.821
b(x, a)At~(x, a),
where
Q h( ) _ 2 hlb1(x,a)l hlb2(x, a)l
x, a - q + .81 + .82 ,
5.4 A Direct Construction 119
h2
~th(x,a) = Qh(x,a)"
We only need to check whether the resulting expressions for the transition
probabilities are all nonnegative.
5.4.3 Example 3
We next deal with a special two dimensional case with a nondegenerate
covariance. Refer to Figure 5.2a for an illustration of the terms and the
specification of the values of f3i·
f3t2 = .5
Let there be vectors (as in the figure) Vi of the form Vt = (f3n, f3t2) and
v2 = (- f32t, /322), where we set f3n = /322 = 1. Assume that there are
120 5. The Approximating Markov Chains
positive real numbers Qi such that a(x) =a= qrv 1 v~ + q~v2 v~. Thus, we
can write the process as
(4.14)
where the wi(·) are mutually independent one dimensional Wiener pro-
cesses.
If the interpolation time interval is defined by h2 jQh(x, a), then the chain
is locally consistent with the systems model of this example. Because the
transitions are not only to the nearest neighbors, more memory is required.
(4.15)
where ph(x, yla) are the transition probabilities and ~th(x, a) --+ 0, as
h--+ 0.
A convenient way of getting the transition probabilities follows the ap-
proach of the above examples by treating the transition probabilities for
the noise and drift components separately and then adding them up with
the appropriate weight and normalization factor. Suppose that qf(x) and
q?(x, a) are nonnegative (see the remarks on extensions below) and satisfy
L qf(x)vi(x) = 0. (4.17)
iEM(x)
Define
Qh(x, a)= L [hq?(x, a)+ qf(x)]. (4.18)
iEM(x)
If the covariance a(x) depends on the control, then so would the qf{-).
Define the interpolation interval
(4.19)
122 5. The Approximating Markov Chains
and suppose that it goes to zero as h -+ 0. Then (4.19) and the transition
probabilities defined by
(4.20)
(4.21)
dx = b(x)dt + dw,
w(t) = ( wl(t))
w 2 (t) + vw3(t).
With this decomposition, and the grid spacing depending on the direction,
we can easily get a locally consistent approximation.
not affect the limit process. This point will be discussed further in Chapter
10. The probabilistic interpretation comes to our aid here also and allows
us to treat a relaxation of the consistency and continuity conditions which
might be difficult to achieve via the analytic methods of numerical analysis.
The problem occurs in practice in situations such as the following. Consider
a problem where the set G of interest for the control problem is divided
into disjoint subsets {G 1 , ... } , and we wish to have a different "level" of
approximation on each Gi. For example, a different spacing might be used
for the approximating grid on each subset. Suppose that the state space or
approximating grid is Sh and that we can find a suitable locally consistent
chain on the parts of Sh which are interior to each Gi. Due to the discon-
tinuities in the grid sizes, it might not be possible to get local consistency
at all points on the boundaries separating the Gi.
The point will be illustrated via a particular two dimensional example
with Sh shown in Figure 5.3a.
u~ 'U<J
y4 Yf lfl '/17 A
-v
y IU1
y2 u~
Gl
j ~~
"symmetric" way that the other points do. The main difficulties as well as
their resolution can be seen via a particular example.
b1(x,a)dt,
{5.1)
b2(x, a)dt + adw, a> 0.
q~(yo, a) 2bt(Yo,a),
qg(yo, a) 2b2(Yo, a),
q?(Yo, a) 2bi(Yo,a),
qg(yo, a) b!(Yo,a)/2,
q~(yo, a) b!{yo,a)/2.
Yt
where qt(y6) and qJ(y6) are positive. In order not to introduce a very large
value of E~~~~e~, it is required that the mean change implied by (5.4) is
zero; i.e.,
(5.5)
This is a critical condition and implies that qt(y6) = 4qJ(y6)· Choose
Then
2
a= ~ [4vt(Y6)v~ (y6) + vs(Y6)v~(Y6) + vg(Y6)v~(Y6)], (5.6)
or
-- 2[1/60 0]1 .
a-a
varh,a(~th)
x,n
> k 1 h2 '
<,n 2 - (5.7)
where k 1 > 0 and (~e~h is the component in the vertical direction. In fact,
the properties in the last sentence are the most important, provided that
the "drift" IE~·:: ~e~ I/ ~t~ is bounded in a neighborhood of the boundary.
Otherwise, the' form of the approximation used on Ao is not important.
5.6 Jump Diffusion Processes 127
(6.3)
n
128 5. The Approximating Markov Chains
The term N(t, H) is just the number of jumps with values in the setH on
the interval [0, t], and N(t, r) = number of jumps on the interval [0, t].
P{x( ·)jumps on [t, t + ~)lx(s), w(s), N(s), s :::; t} =A~+ o(~). (6.4)
Define
fr(x, H)= II{p: q(x, p) E H}.
By the independence properties and the definition of Pn, for H c r we
have
P{x(t)- x(t-) E HI jump at t,x(t-) = x,w(s),x(s),N(s),s < t}
= II{p: q(x(t- ), p) E H} = fr(x(t- ), H).
(6.5)
is analogous to what was required by (4.1.3) for the diffusion model. The
only difference between the cases of Section 4.1 [i.e., (3.1)] and our case
(6.1) is the presence of the jumps. Let Qh ( ·) be a bounded (uniformly in h)
measurable function such that
where /3 > 0 and Nh is the first exit time from the set cg = Sh n G 0 . Then
the dynamic programming equation is, for X E GR,
where the jumps in J(·) are either ±1, each with probability 1/2. Let 1/h
be an integer, and suppose that the points in sh are h units apart in the
e2 direction. Then the integral in the above expression is just
(6.8)
e~
et Qh(e~, pt)
e!l
e~
e3
Using the values of EfAef from Section 4.3, the last equation can be
written as
n-1
e~ = x+ I)b(ef,u~)At~ +o(At~)]JHf +M~ + J!, (6.9)
i=O
where M! and J~ are the middle and right hand sums, respectively, in
the previous equation. Due to the centering of its terms about the condi-
For each t,
E[number of n : v~ :::; t] -+ >.t
ash-+ 0. This implies that we can drop the IHh in (6.9) and (6.10) with
no effect in the limit. Using this reasoning, and the reasoning which led to
the representation (4.3.9), we can write
(6.11)
where -y(s) E r(x(s)) almost surely (w, s) with respect to the random mea-
sure induced by izi(·).
We next list the assumptions on r( ·) that will be used. Following the
statements of the assumptions is a discussion of their significance and ref-
erences to the literature.
The following conditions are assumed to hold. These conditions will be
used in the convergence theorems of Chapter 11.
(i) For each x, the positive cone generated by the vectors in r(x) is
convex.
(ii) The set G can be constructed as the intersection of a finite number
of "smooth" sets in the following way. There are a finite number of contin-
uously differentiable functions gi(·) and sets Gi = {x : Yi(x) :=::; 0} whose
boundaries are 8Gi = {x: Yi(x) = 0} and such that G = niGi. The sets G
and each Gi are assumed to be the closure of their interiors.
(iii) Suppose that for a given X E aa there is a single index i = i(x)
such that 9i(x) = 0, and let n(x) denote the interior normal to aai(x) at
x. Then the inner product of all the vectors in r(x) with n(x) is positive.
(iv) Define the index set l(x) = {i: x E 8Gi}· I(x) is upper semicontinu-
ous in the sense that if X E 8G, there is 6 > 0 such that jx- yj < 6 implies
that l(y) c I(x). Next, suppose that X E aa lies in the intersection of
more than one boundary; i.e, I(x) has the form I(x) = {i(1), ... , i(k)}
for some k > 1. Let N(x) denote the convex hull of the interior nor-
mals ni(l)' ... 'ni(k) to aGi(l)' ... 'aai(k) at X. Let there be some vector
v E N(x), such that -y'v > 0 for all 'Y E r(x).
In (7.1) and (7.2), r(x) needed to be defined only for x E 8G. For our pur-
poses of "approximation," the definition needs to be extended so that r(x)
is defined and satisfies an appropriate "upper semicontinuity" condition
134 5. The Approximating Markov Chains
where the Yi(·) are continuous, nondecreasing, satisfy Yi(O) = 0 and can
increase only at t where x(t) E 8Gi. The representation (7.3) is not neces-
sarily unique without further conditions, in the sense that z( ·) might not
determine Y(·) uniquely. If the covariance matrix a(·) is uniformly nonde-
generate, (70],(100, Chapter 4] then the contribution to z(·) is zero (with
probability one) during the times that x(t) E 8G. Then z(·) determines
Y ( ·) with probability one. This issue will be returned to in Chapter 11.
Conditions (i) and (v) are standard in the literature, although they may
be implied rather than explicitly stated. They are related to the fact that
reflecting diffusions often appear as weak limits. For example, in the "heavy
traffic" models of Chapter 8 the set G is usually the nonnegative orthant in
some Euclidean space, and r(x) is independent of x and single valued on the
relative interior of each "face" of G. The definition of r(x) is then extended
to all of {)G by the sort of "semicontinuity" assumed in (v) together with
the convexity assumed in (i). Assumption (ii) describes the regularity of the
boundary. It is formulated in such a way that is covers both the classical
case in which oG is a smooth manifold as well as the setting in which
G is a polytope. The formulation of conditions (i), (ii) and (v) follows
[41], which considers questions of strong existence and uniqueness. Related
works that also deal with strong existence and uniqueness of solutions are
[69, 115, 134, 147].
The most interesting conditions are (iii) and its extension in (iv). Ba-
sically, these assumptions guarantee that the reflection term must always
point "into" the domain G. They allow a key estimate on the reflection term
of the process in terms of the drift and diffusion components. This estimate
will be exploited in Chapter 11 to prove the weak convergence of processes
that is needed for the convergence proofs for the numerical schemes. This
type of assumption has been used previously in proving existence in [42].
For the special case when G is the r-dimensional nonnegative orthant in
JW and r(x) is independent of x and single valued on the relative interior of
each ''face" of oG, the condition (iv) is equivalent to the "completely-S"
condition used in [149, 148]. In these papers the condition has been used
to prove weak existence and uniqueness for the special case of a reflecting
Wiener process (i.e. b(·, ·) = 0 and cr(·) = cr).
As pointed out in Chapter 1, the term "constrained process" is often more
appropriate than the term "reflected process." In many of the applications
in which reflecting diffusion models arise, the reflection term is actually
a constraining term. For example, in the so-called "heavy traffic" models
discussed in Chapter 8, the reflection term is what keeps the buffers of cer-
tain queues from either becoming negative or overflowing. This is more of
a constraint than a reflection, in the physical sense. Reflecting boundaries
are often artificially introduced in order to get a bounded region in which
the numerical approximation can be carried out. Many control problems
are originally defined in an unbounded space, which is inconvenient for nu-
merical purposes. One tries to truncate the space in such a way that the
essential features of the control problem are retained. We can do this by
simply stopping the process when it first leaves some fixed and suitably
large set. One must then introduce some boundary condition or cost on the
stopping set. The objective is to choose a boundary cost that is believed to
be close to whatever the cost would be at these points for the untruncated
process. In particular, we do not want the chosen boundary and boundary
condition to seriously distort the optimal control at points not near that
136 5. The Approximating Markov Chains
Let {~~, n < oo} be a Markov process with the transition probabilities
ph(x, y) on act and ph(x, yia) on Gh. We say that the transition function
ph(x,y) is locally consistent with the reflection directions r(·) if there are
fl > 0 and Ci > 0 such that for all X E aGt and all h,
/
aG + h /
[:::7
J
v G
J
1/
""
""'
Figure 5.5. The boundary approximation.
The conditions {7.4) are needed for the convergence theorem in Chapter
11. For any fixed value of h, we would try to choose the ph (X' y) for X E aat
so that the behavior of the chain on aat copies as closely as possible the
behavior of the physical model on 8G. This is particularly helpful when
there is nonuniqueness of the directions. See the comments at the end of
Example 2 below.
The Interpolated Process ..ph (·). For x E 8Gt, we define the interpo-
lation interval tl.th(x) to be zero. The reason for this is the correspondence
of the set aat with the reflecting boundary 8G, and the fact that the role
of the reflection is to merely keep the process from leaving the desired state
space G. Indeed, the "instantaneous" character is inherent in the definition
of the reflected diffusion {7.1), (7.2).
Definer~ as in Section 4.3, but use the interpolation intervaltl.th(x) = 0
for x E 8G+. Thus, if e~ E 8Gt (i.e., n is the time of a "reflection"
step), then r~+l = r~. Define the continuous time parameter Markov chain
interpolation 'lj!h(·) by (4.3.1'). Note that the expression (4.3.1) cannot be
used to define 'lj!h(·), because (4.3.1) is multi-valued at those n which are
reflection steps. An alternative to (4.3.1') is: 'lj!h(r~) = e~(n)' where m(n) =
138 5. The Approximating Markov Chains
max{i : rih = r~}. Thus the values of the states at the moments of the
reflection steps do not appear in the definition of the interpolation. In this
sense these states are "instantaneous."
The reflecting states can actually be removed from the problem formu-
lation by using the multistep transition functions which eliminate them. It
is often useful to retain them to simplify the coding of the computational
algorithms for the solution of the dynamic programming equations of, e.g.,
Section 5.8 below, as well as to facilitate the convergence proofs.
5. 1.4 Examples
Example 1. Getting a transition function which is locally consistent with
the boundary reflection is often quite straightforward, as will be illustrated
by three simple examples. First consider the case where G is the rectangular
area in Figure 5.6. The reflection direction at the point x, denoted by r(x),
5. 7 Reflecting Boundaries 139
is the vector of unit length which has direction (2, 1). Let the state space
Sh be the regular h-grid. The simplest choice for r/'(x, y) is obviously
ph(x, x + (h, 0)) = ph(x, x + (h, h)) = 1/2. Here, we have obtained the
proper average direction by randomizing among two "allowed" directions.
The effects of the perturbations a.z~ about the mean value will disappear
ash -t 0.
u( ·) dropped. For a stopping time T and a discount factor {3 > 0, let the
cost for x(·) be the discounted form of (3.2.1}:
(8.2}
where we recall the notation t~ = 2:~,:01 ~tf and ~t~ = ~th(e~). From
Section 2.2, the dynamic programming equation for the infima of the costs
is
Due to the discounting, all the costs and equations are well defined. Any
acceptable approximation to e-f3tl.th(z) can be used; for example, if h is
small, then one can use either of
1
1 + {3~th(x) ·
Thus we require that the stopping times for x( ·) be no larger than r'. In
this case, {8.3) holds for x E Gh = G n Sh; otherwise,
(8.5)
The Interpolated Process ,ph (·). Recall the continuous parameter Mar-
kov chain interpolation 'f/!h(-) from Section 4.3 or Subsection 5.7.3. For a
stopping time r for 'f/!h (·), an appropriate analogue of the cost function
(8.1) is
By the results in Section 4.3, (8.8) is also the cost for the discrete parameter
chain (with r = r~ being the interpolated time which corresponds toN) if
the discount factor e-/JAth(x) is approximated by 1/[1+/3~th(x)]. Otherwise
it is an approximation, with an error which goes to zero as h---+ 0.
144 5. The Approximating Markov Chains
Let u~ = u(e~) for a feedback control u(·). If the sum is well defined and
bounded for each x under u(.·), then Wh(x, u) satisfies the equation
Wh(x,u) =
L e-f3llth(x,u(x))ph(x, yiu(x))Wh(y, u) + k(x, u(x))ath(x, u(x))
y
(8.10)
for x E G~, and with the boundary condition
where Th is the first escape time of 1/Jh(·) from G0 , and uh(·) is the con-
tinuous parameter interpolation of {u~, n < oo}. The remarks which were
made in the previous subsection concerning the equivalence of this cost
with that for the discrete parameter chain hold here also.
(8.14)
for X E cg. Define the matrix Rh(u) = {rh(x,yiu(x));x,y E cg} where
(8.15)
for x E cg. Then we can write the equation for the cost (8.10) and the
dynamic programming equation (8.12) as
5. 8. 3 Reflecting boundary
Now suppose that the boundary is "instantaneously" reflecting as in Section
5.7, and the transition probabilities for the reflecting states do not depend
on the control. The approximating chain is assumed to be locally consistent
with. the diffusion (3.1) or the jump diffusion (6.1) in Ch = Sh n C, and
with the reflection directions r(x) on act, where the reflecting boundary
is disjoint from Ch. Recall that the interpolation intervals fl.th(x) equal
zero for points in aCt. An obligatory stopping set and associated stopping
cost can be added and will be commented on briefly below.
Suppose that the cost for (7.1) takes either the form (8.18a) or (8.18b):
where c(·) is bounded and continuous and c'(x)'y 2: 0, for any"( E r(x), x E
aC. When Cis a convex polyhedron and (7.3) holds, then we use
where Ci 2: 0.
Let u = { u~, n < oo} be an admissible control sequence. Write the "con-
ditional mean" increment when the state e~ = X is in aCt as fl.zh(x) =
E~;::Lle~, and (for the case of polyhedral C and the representation (7.3))
146 5. The Approximating Markov Chains
(8.19b)
Let u~ = u(e~) for a feedback control u(·). Then, if the sum in (8.19)
is well defined and bounded, Wh(x, u) satisfies (8.10) for x E Gh. For
X E oGt and (8.19a) we have
The equation for the optimal value function is (8.12) for x E Gh, and for
x E aat it is
A Vector Form of the Equations for the Cost. The formulas (8.16)
and (8.17) hold with appropriate redefinitions. Redefine the vector Wh(u) =
{Wh(x, u), X E Gh u aat} and similarly redefine the vector yh. For feed-
back u(·), redefine Rh(u) to have the components {rh(x,yiu(x)), x,y E
Gh u aat}, where (8.15) holds for X E Gh, and for X E aat use
(8.22)
5.8 Dynamic Programming Equations 147
Eliminating the Reflection States. The states in the reflection set act
can be eliminated from the state space if desired, because their transition
probabilities do not depend on the control and their interpolation intervals
are zero. But, from the point of view of programming convenience (the au-
tomatic and simple generation of the transition probabilities and intervals)
it seems to be simpler to keep these states. We will illustrate the method of
eliminating these states for one particular case, and this should make the
general idea clear. Refer to Figure 5.10.
Example 1. Fix (x, a). Suppose that the scaling and the ordering of
the components of the state are such that a(x, a) = {1, k + -y(x, a)) for
0 < -y(x,a) < 1. First, let us approximate the one step transition by ran-
domizing between the values
x --+ x ± [e 1 + e2k]h, with probability p!/2, each
x--+ x ± [e1 + e2(k + 1)]h, with probability P2/2, each,
where Pl + P2 = 1. Then
E~::: [e~+l -x] [e~+l -x]' = h 2 C(x,a), Ath(x,a) = h2 , {9.2)
5.9 Controlled and State Dependent Variance 149
where
C(x, a:)
We have C11 {x,a:) = al(x,a) = 1, and would like match the remaining
elements c12(x, a:) and c22{x, a:) with the analogous elements of a(x, a:).
But there is only one parameter that can be selected, namely P2· First
choose P2 so that C12(x,a:) = al(x,a:)a2(x,a:) = a12(x,a:). Then P2
'Y(x, a), and the relative numerical noise for the {2,2) component is
C22(x, a:) - a~(X, a:) _ ')'(X, a:){1 - 'Y(X, a:)) _ O (__!__). {9.3)
a~(x,a) - (k+')'{x,a:))2 - k2 ·
Alternatively, choosing P2 such that C22 {x, a:) = a~(x, a:) yields
_ ')'(X, a:){2k +')'(X, a:)} < ( )
P2- 2k+1 _')'x,a,
and the relative numerical noise for the (1,2) component is then
a1(X, a)a2(X, a:)- C12(x, a:) = ')'(X, a:}(1- ')'(X, a:)) = O (-1-) •
a1 (x, a)a2(x, a) (2k + 1)(k + 'Y(x, a:)) 2k2
Thus, although both procedures seem reasonable, we see that neither gives
local consistency if 'Y(x, a) is neither 0 nor 1, although the relative numer-
ical noise decreases rapidly as k increases. If we simply used P2 = 0 or 1,
then the relative numerical noise would be 0(1/k), which illustrates the
advantages of randomization.
The solution to the optimal control problem is often relatively insensitive
to small numerical noise, even up to 5-10% (which has an effect similar to
that due to adding noise to the dynamics). One must experiment.
Example 2. Fix (x, a:). Now, extending the above example, suppose that
the scaling is such that a(x,a:) = (kt.k2 + 'Y(x,a:)), where k2 2: k1. Con-
struct an approximating chain by using the transitions
x -t x ± [e1k1 + e2k2]h, with probability pt/2, each,
x -t x ± [e1k1 + e2(k2 + l)]h, with probability P2/2, each.
Defining the matrix C(x, a:) analogously to what was done above, we have
al(x,a) = Cu(x,a:) for any P2· Choosing P2 so that C12(x,a:) is equal to
a1 (x, a)a2(x, a:) yields P2 = 'Y(x, a) and the relative numerical noise for the
{2,2) component is
C22(x, ~)- a~(x, a:) = ')'(X, 0:)~1- 'Y(X, a:)) = O (~).
a 2(x, a:) a 2(x, a:) k2
150 5. The Approximating Markov Chains
If we simply set P2 = 0, then the relative numerical noise for the (2, 2)
component is
x--? x ± [e1k~(x, a)+ e2k~(x, a)]h, each with probability 1/2. (9.5)
l
Then the conditional covariance matrix (9.2) is
(9.7)
(9.8)
If there are no integers such that (9.4) holds, then choose kh --? oo
but such that khh --? 0. Then choose kNx, a), k~(x, a) such that they go
to infinity as h --? 0, are no greater than kh, and satisfy (which defines
/'~(x,a))
then there is local consistency in the (1, 1) component. The relative numer-
ical noise for the (2,2) component is then
C11(x,a) (kNx,a)) 2
C12(x,a) = kf(x, a)(k~(x, a)+ -y~(x, a))
(9.13)
C21(x,a) kf(x, a)(k~(x, a)+ -y~(x, a))
c22(x,a) (kq(x, a)+ -y~(x, a)) 2 + -y~(x, a)(1 --y~(x, a))
Let A.th(x, a) be given by (9.10). Then there is local consistency in the (1, 1)
and (1, 2) components. The relative numerical noise in the (2,2) component
is
-y~(x, a)(1 --y~(x, a)) = 0 ( 1 ) (9.14)
(k~(x, a)+ -y~(x, a)) 2 (k~(x, a))2 ·
Thus, randomization is preferable, but it involves a more complex code.
Of course, in any application, one uses only a few fixed small values of h.
But the above comments serve as a useful guide. To apply the procedure in
the various numerical algorithms of the next chapter, the set U will usually
be approximated by a finite set by Uh.
6
Computational Methods for
Controlled Markov Chains
The chapter presents many of the basic ideas which are in current use for the
solution of the dynamic programming equations for the optimal control and
value function for the approximating Markov chain models. We concentrate
on methods for problems which are of interest over a potentially unbounded
time interval. Numerical methods for the ergodic problem will be discussed
in Chapter 7, and are simple modifications of the ideas of this chapter.
Some approaches to the numerical problem for the finite time problem will
be discussed in Chapter 12.
The basic problem and the equations to be solved are stated in Section
6.1. Section 6.2 treats two types of classical methods: the approximation in
policy space method and the approximation in value space method. These
methods or combinations of them have been used since the early days of
stochastic control theory, and their various combinations underlie all the
other methods which are to be discussed. The first approach can be viewed
as a "descent" method in the space of control policies. The second method
calculates an optimal n-step control and value function and then lets n go
to infinity. The Jacobi and Gauss-Seidel relaxation (iterative) methods are
then discussed. These are fundamental iterative methods which are used
with either the approximation in policy space or the approximation in value
space approach. When the control problem has a discounted cost, then one
can improve the performance of the iterations via use of the bounds given in
Section 6.3. The so-called accelerated Gauss-Seidel methods are described
in Section 6.4. This modification generally yields much faster convergence,
and this is borne out by the numerical data which is presented.
The possible advantages of parallel processing are as interesting for the
154 6. Computational Methods
control problem as for any of the other types of problems for which it has
been considered. Although it is a vast topic, we confine our remarks to the
discussion of several approaches to domain decomposition in Section 6.5.
The size of the state spaces which occur in the approximating Markov chain
models can be quite large and one would like to do as much of the compu-
tation as possible in a smaller state space. Section 6.6 discusses the basic
idea of grid refinement, where one first gets a rough solution on a coarser
state space, and then continues the computation on the desired "finer"
state space, but with a good initial condition obtained from the "coarse"
state space solution. Section 6. 7 outlines the basic multigrid or variable grid
idea, which has been so successful for the numerical solution of many types
of partial differential equations. This method can be used in conjunction
with all of the methods discussed previously. With appropriate adaptations
of the other methods, the result is effective and robust and experimental
evidence suggests that it might be the best currently available. There are
comments on the numerical properties throughout, and more such com-
ments appear in Chapter 7. In Section 6.8, the computational problem is
set up as a linear programming problem. The connections between approx-
imation in policy space and the simplex algorithm are explored and it is
shown that the dual equation is simply the dynamic programming equation.
Some useful information on computation and some interesting examples are
in [15, 19, 135, 150]. An expert system which incorporates some forms of
the Markov chain approximation method is discussed in [20, 21].
l'
y
and
(1.2)
respectively.
This chapter is concerned with a variety of methods which have been
found useful for the solution of these equations. Many of the methods come
6.1 The Problem Formulation 155
from or are suggested by methods used for the solution of the systems of
linear equations which arise in the discretization of PDE's, as, for example,
in [142].
Because we will generally be concerned with the solution of equations
[such as {1.1) and {1.2)] for fixed values of h, in order to simplify the
notation we will refer to the state space of the Markov chain simply asS
and rewrite these equations, with the superscript h deleted, as
Al.l. r(x, yia), C(x, a) are continuous functions of a for each x andy in
s.
A1.2. (i) There is at least one admissible feedback control u 0 (-) such that
R(uo) is a contmction, and the infima of the costs over all admissible con-
trols is bounded from below. {ii) R(u) is a contmction for any feedback
control u( ·) for which the associated cost is bounded.
A1.3. If the cost associated with the use of the feedback controls u 1 (·), ... ,
un(-), ... in sequence, is bounded, then
Richardson Extrapolation. Suppose that the optimal cost for the origi-
nal control problem for the diffusion is V(x) and that for the Markov chain
~pproximation it is Vh(x). Suppose that the solutions are related by (for
appropriate values of x)
While there is no current theory to support this practice in general for all
of the problems of interest in this book, it generally yields good results,
if reasonably accurate solutions for small enough parameters h and h/2
are available. The scheme is known as Richardson extrapolation [32]. The
general problem of selecting the approximation for best accuracy and the
appropriate extrapolations needs much further work.
Theorem 2.1. Assume (Al.l) and (A1.2). Then there is a unique solution
to (1.4), and it is the infimum of the cost functions over all time independent
feedback controls. Let uo( ·) be an admissible feedback control such that the
cost W(uo) is bounded. For n 2: 1, define the sequence of feedback controls
un(-) and costs W(un) recursively by (1.3) together with the formula
U.+t (x) ~ arg :::1!J [~ r(x, yla)W(y, u,.) + C(x, <>)]· (2.1)
Then W(un)--+ V.
Under the additional condition (A1.3), Vis the infimum of the costs over
all admissible control sequences.
Proof. To prove the uniqueness of the solution to (1.4), suppose that there
are two solutions V and V, with minimizing controls denoted by u( ·) and
u( ·), respectively. Recall that all inequalities between vectors are taken
component by component. Then we can write
n-1
V = Rn(u)V + L Ri(u)C(u). (2.2)
i=O
6.2 Classical Iterative Methods 157
The right hand side sum in (2.2), which is obtained by iterating (1.4), is
bounded. Hence, by (A1.2) R(u) is a contraction, and similarly for R(u).
Consequently, the right sum of (2.2) converges to the cost function W(u) =
V. We can write
min [R(u)V
u(x)EU
+ C(u)]
(2.3a)
R(u)V + C(u):::; R(u)V + C(u),
implies that
W:::; min [R(u)W
u(x)EU
+ C(u)]. (2.6)
Theorem 2.2. Let u(·) be an admissible feedback control such that R(u) is
a contraction. Then for any initial vector Wo, the sequence Wn defined by
Vn is the minimal cost for an n-step problem with terminal cost vector V0 •
which is then-step cost for the policy which uses ui(·) when there are still
i steps to go and with terminal cost Vo. In the above expression, the empty
product TI!+l is defined to be the identity matrix. The minimizing prop-
erty in {2.8) yields that for any other admissible feedback control sequence
{un(·)} for which the cost is bounded, we have
i=l
which is the cost for an n + 1-step process under the controls {ui(-)} and
terminal cost V0 • Thus, Vn is indeed the asserted minimal n-step cost.
6.2 Classical Iterative Methods 159
and Vn ~ V. •
Theorem 2.3. Let u(·) be an admissible feedback control for which R(u)
is a contraction. For any given Wo, define Wn recursively by
Wn+l(x,u) =
L r(x,yJu(x))Wn+l(y,u) + L r(x,yJu(x))Wn(y,u) + C(x,u(x)).
y<x y~x
(2.10)
Then Wn converges to the unique solution to (1.3).
Assume (A1.1)-(A1.3). For any Vo, define Vn recursively by
Vn+l(x) = ~Jf: [L
y<x
r(x,yJa)Vn+l(Y) + L r(x,yJa)Vn(Y) + C(x,a)].
y~x
(2.11)
160 6. Computational Methods
Remark. It is useful to note that there are f(x, ylu) and C(x, u) such that
(2.10) can be written in the form
Let x(1) denote the lowest state in the ordering. Then the terms f(x,ylu)
and C(x, u) in (2.12) are defined in the following recursive way:
f(x(1),ylu) = r(x(1),ylu(x(1))),
f(x,ylu) = r(x,ylu(x)) +L r(x,zlu(x))f(z,ylu), y ~ x ~ x(1),
z<x
f(x, ylu) = L r(x, zlu(x))f(z, ylu), x > y ~ x(1),
z<x
(2.13)
and
C(x, u) = L r(x, ylu(x))C(y, u) + C(x, u(x)). (2.14)
y<x
(3.4)
For example, for each x choose the value which is at the midpoint of the
range in (3.4).
"( [2:
y<x
ro(x, yiu(x))Wn+I(Y) +L
y~x
ro(x, yiu(x))Wn(Y)l + C(x, u(x)).
(3.5)
Referring to (2.13) and (2.14), rewrite the iteration (3.5) in the form
Wn = R(u)Wn + C(u)
(4.1)
If w(x) > 1 for all x, then this procedure is known as the accelemted Jacobi
(AJ) method, where the w(x) are the acceleration parameters. If w(x) < 1,
then it is called the weighted Jacobi procedure [16].
There are two forms of the accelerated Gauss-Seidel method, depending
on whether the "acceleration" is done at the end of a Gauss-Seidel sweep or
continuously during the sweep. Let w(x) > 1. The first, which we call the
semi-accelerated Gauss-Seidel method (SAGS) is defined by the recursion
Wn(x) = [L
y<x
r(x, ylu(x))Wn(Y) + L r(x, ylu(x))Wn(Y) + C(x, u(x))] ,
y~x
(4.2)
6.4 Accelerated Jacobi and Gauss-Seidel Methods 167
There is a similar expression for (4.2), where R(u) is replaced by the R(u)
defined in (2.13). If.\ is an eigenvalue of R( u), then w.\ +(1-w) is an eigen-
value of Rw(u). The idea is to choose a value of w such that the spectral
radius of Rw(u) is less than that of R(u). The accelerated Jacobi procedure
is not generally useful, but the accelerated Gauss-Seidel is. Examples will
be given below. It will be seen below that the AGS is actually the best pro-
cedure among these choices, in the sense that it gives the smallest spectral
radius with appropriate choices of the parameter w.
Generally some experimentation is needed to get appropriate values of
the factor w. This is often easier than it might seem at first sight. For the
approximation in policy space method, one solves a sequence of problems
of the type (1.3). For many problems, it has been experimentally observed
that the range of possible w (those for which the algorithm is stable) as well
as the optimal values of w do not change much from one control to another.
This point will also be pursued further in the example of Subsection 6.4.3.
Also, in doing numerical optimization, one frequently wants to solve a fam-
ily of optimization problems with closely related cost functions or system
dynamics, in order to get controls which are reasonably robust with respect
to possible variations in these quantities. Experimentation with the value
of w in the early stages of such a procedure generally yields useful values for
the rest of the procedure. Accelerated procedures have also been used with
the nonlinear iteration in value space algorithm (1.4). See [105], which first
introduced these "acceleration" methods for the computation of optimal
controls.
168 6. Computational Methods
0 .9900 0 0
.4975 0 .4975 0
0 .4975 0 .4975
0 0 .9900 0
Table 4.2. The matrix >.PN .
.4975 .4975 0 0
.2475 .2475 .4975 0
.1231 .1231 .2475 .4975
.0613 .0613 .1231 .7450
Table 4.3. The matrix for the Gauss-Seidel iteration using >.P.
0 .9900 0 0
0 .4925 .4975 0
0 .2450 .2475 .4975
0 .2426 .2450 .4925
Table 4.4. The matrix for the Gauss-Seidel iteration using >.PN.
Because the case at hand, where the states are ordered on the line, is so
special, we need to be careful concerning any generalizations that might be
made. However certain general features will be apparent. With the Gauss-
Seidel forms, the state transitions are not just to nearest neighbors, but to
all the points to which the neighbors are connected, in the direction from
which the sweep comes.
The eigenvalues of the matrix >.Pare in the interval [-.995, .995], those
of >.PN are in [-.9933, .9933], those of the matrix in Table 4.3 in [0, .9920],
and those of the matrix in Table 4.4 in the interval [0, .9867]. (They are all
real valued.) The eigenvalues of the matrices used for the Jacobi iterations,
>.P and >.PN, are real and symmetric about the origin in this case. The
"shift of eigenvalues" argument made in connection with (4.4) above or
(4.7) below, implies that the use of (4.1) is not an improvement over the
Jacobi method if w > 1. The "one sided" distribution of the eigenvalues
for the Gauss-Seidel case suggests that the methods (4.2) and (4.3) might
yield significant improvement over the basic methods.
To see this point more clearly, consider the same example as above, but
with the smaller value of the difference interval h = 1/20. The results
are tabulated in the tables below. where the subscript N denotes that the
normalized matrix PN is used as the basis of the calculation of the matrix
for the Gauss-Seidel or accelerated cases, as appropriate. For this case, the
spectral radius of the Jacobi matrix >.Pis 0.9950, that for the normalized
Jacobi matrix >.PN is 0.9947, and that for the normalized Gauss-Seidel case
170 6. Computational Methods
is 0.9895. For the accelerated procedure, we have the following spectral radii
for the listed cases
w 1.7 1.5
SAGSN .9821 .9842
AGS .9650 .9766
AGSN .9455 .9699
Table 4.5. Spectral radii for the accelerated procedures.
The best procedure is the full AGSN, although the acceleration obtained
here with AGSN is greater than one would normally get for problems in
higher dimensions. Data for a two dimensional case in [104] support this
preference for AGS in the case of the nonlinear iteration in value space
algorithm also.
which yields
2
w - ---,------,- (4.7)
- 2- (b- a)"
Thus for larger (b- a), use a larger value of w.
6.4.3 Example
Suppose that the approximation in policy space algorithm of Theorem 2.1
is used to solve (1.4). Then a sequence of solutions unO is generated, and
one wants to get an estimate of the solution to (1.3) for each such control.
We comment next on experimental observations of the sequence of optimal
acceleration parameters. The Markov chain is obtained as an approximation
to the following system where x = (x1, x2) and lui ~ 1:
X2dt, (4.8)
(b2(x) + u)dt + dw.
The state space is defined to be the square G = {x : lxil ~ B, i = 1, 2}
centered about the origin. We let the outer boundary of the square be
6.5 Domain Decomposition 171
reflecting and stop the process at the first time that the strip {x : lx 1 1 ::;
T
8 < B} is hit. Let the cost be W(x,u) = E~ J;[l + lu(s)l]ds. To start the
approximation in policy space method, suppose that the initial control uo(·)
is identically zero, get an approximation to the cost W(u 0 ), and continue
as in Theorem 2.1.
Suppose that we use the AGS method to get approximate solutions to
the linear equations (1.3} for the costs W(un)· The range of acceleration
parameters for which the algorithm is stable and the best acceleration
parameter were observed to decrease slightly as n increases, but not by
much. For n = 0, the best weight is slightly greater than 1.3. This would
be unstable for the optimal control, for which the best parameter is about
1.25. Even when accelerated, the convergence of the AGS methods for (1.3}
was observed to be slower for the initial control than for the optimal control.
Let u(·) denote the optimal control. Then to solve (1.3} to within a given
precision takes more than five times longer with R(u 0 ) than with R(u).
These observations hold true for various functions b2 (·). The conclusions
are problem-dependent, but analogous patterns appear frequently.
For this example, the process is absorbed faster for the optimal con-
trol than for the initial members of the sequence of controls generated by
the policy iteration method. Intuitively, faster absorption means a smaller
spectral radius. For other cases, the best acceleration parameter increases
slightly as n increases.
The method of Section 6. 7 first gets a good approximation to the solution
of the optimization problem on a coarse grid (state space}, and then uses
that approximation as the initial condition for the solution on a finer grid.
We observe experimentally for problems such as the one above that the
optimal acceleration parameter (for the optimal controls) increases slightly
as the grid is refined. This seems to be another consequence of the relative
speed of absorption or killing of the process, which is faster on the coarser
grid. Thus, good values of the weight for the coarse grid would provide a
good start for determining a good weight for the finer grid.
method of Section 6.8 are a topic of active research, and the reader can find
useful ideas in the proceedings of many conferences devoted to the subject
[117] and in [128].
In principle, the computations (2.7) for the Jacobi method can be done
simultaneously for each of the states, because the values of the (n + 1)-st
iterate Wn+l(x,u) for all xES are computed directly from the values of
the n-th iterate Wn(Y, u), y E S. On the other hand, the computation of
Wn+l(x, u) in (2.10) for each state x depends on the values of the (n+1)-st
iterate Wn+l(y, u) for the states y < x. Thus a simultaneous calculation of
all Wn+l(x,u) via a parallel processor is not possible. Many intermediate
variations have been designed with the aim of preserving some of the ad-
vantages of the Gauss-Seidel method, but allowing the use of some parallel
processing. One can, in fact, mix the Gauss-Seidel and Jacobi procedures
in rather arbitrary ways, with the state space being divided into disjoint
groups of points, with a Gauss-Seidel procedure used within the groups
and the groups connected via the Jacobi procedure. Such is the case with
the three exi:J!IIlples below. One well known scheme, called the red-black
Gauss-Seidel method (for obvious reasons!) will be described next [16]. In
fact, with an appropriate ordering of the states, this method is actually a
Gauss-Seidel method.
For simplicity, suppose that the boundary is absorbing and let the state
space consist of the points of intersection of the grid lines in the figure.
6.5 Domain Decomposition 173
The circles are to be called "black," and the others (excluding the bound-
ary) "red." Suppose that the transitions of the Markov chain from a black
point are either to the red points or to the boundary (but not to other
black points), and that the transitions from the red points are either to
the black points or to the boundary {but not to other red points). This
would be the case if the transition probabilities were obtained by the finite
difference method of Sections 5.1-5.3 applied to a diffusion {no jump term
and diagonal cr(x)cr'(x)) model.
Suppose that we wish to solve {1.3}. The procedure for each iteration is
as follows: Let Wn(u) be the n-th estimate of the solution to {1.3}. First
iterate on all of the black points, using a Jacobi relaxation (the Gauss-
Seidel procedure would give the same result, because the black states do
not communicate to other black states}, obtaining Wn+t(x,u) at the black
points x. Then using these newly computed values, iterate on all the red
points with a Jacobi procedure to get Wn+l(x,u) for the red points x. The
procedure has divided the state space into two groups, each group having
roughly half of the states, and the computation within each group can be
implemented in a parallel fashion.
:.i1
r'f
~u
~;;,
Let the state space S be the intersection of the lines in the figure and sup-
pose that under the given transition probability, the states communicate
only to the nearest neighbors. Thus, the points on G0 communicate with
points in both G1 and G2. The points in G1 (respectively, G2) communicate
only with points in Go and G1 (respectively, Go and G2). A standard de-
composition technique updates Wn(u) via a Gauss-Seidel or Jacobi method
(or accelerated method) in each domain G1 and G2 separately and simulta-
neously. It then uses either a Jacobi or a Gauss-Seidel procedure to update
the values at the points on the connecting set G0 •
In principle, the domain can be subdivided into as many sections as
desired, at the expense of increasing overhead.
174 6. Computational Methods
Domain Decomposition (2). Refer to Figure 6.5, where the state space
is as in the above two examples, and the states communicate only to their
nearest neighbors as well. A separate processor is assigned to update the
states on each of the "columns" in the figure. Proceed as follows. Let the
n-th estimate Wn(u) be given as above. The memory of the i-th processor
contains the values of Wn(x, u) for states x which are in the (i- 1)-st,
i-th, and (i + 1)-st column. It then obtains Wn+l (x, u) for the states x in
the i-th column by a combined Jacobi and Gauss-Seidel type procedure
which uses Wn(y,u) for yin the (i- 1)-st and (i + 1)-st column, and a
successive substitution for y in the i-th column. After this computation is
done for all columns the new values for each column are then transferred
to the processors for the neighboring columns.
might use the interpolation: ifh(x) = ifh2 (x}, for X E Sh 2 , and for X E
sh - sh2' use a linear interpolation of the values in the smallest set of
points in 8h 2 in whose convex hull X lies, with an analogous interpolation
for the control. It seems preferable to do several "smoothings" via a Gauss-
Seidel (accelerated or not) relaxation before the first update of the control
on the finer grid, whether iteration in policy space or in value space is used.
Such a procedure can be nested. Set h = h1 < · · · < hk, such that the
associated state spaces satisfy
(6.1}
6. 7 A Multigrid Method
6. 7.1 The smoothing properties of the Gauss-Seidel iteration
In this section, the state spaces will be referred to as grids. In the previ-
ous section, the problems (1.1) and (1.2) were approximately solved on a
coarse grid. Then the approximate solution and associated control, suit-
ably interpolated, were used as initial conditions for the solution of (1.1)
or (1.2) on the original fine grid. As pointed out, one can nest the proce-
dure and get an initial solution for the coarse grid by starting with an even
coarser grid. In this section, we discuss another procedure, the so-called
multigrid method, of exploiting the relative efficiency of different types of
computations at different levels of grid refinement. The multigrid method
is a powerful collection of ideas of wide applicability, and only a very brief
introduction will be given here. A fuller discussion can be found in [16, 120].
The multigrid method was introduced for use in solving optimal stochastic
control problems by Akian and Quadrat [2, 1], and a full discussion of data
and convergence, under appropriate regularity conditions, can be found in
[2]. Because variable grid sizes will be used, the basic scale factor h will be
used.
A key factor in the success of the method is the "smoothing" property
of the Gauss-Seidel relaxation. In order to understand this and to get a
better idea of which computations are best done at the "coarser level," it
is useful to compare the rate of convergence of the Gauss-Seidel procedure
when acting on smooth and on more oscillatory initial conditions. We will
do this for a discretization of the simplest problem, where the system is
x(t) = x + w(t), where w(·) is standard Wiener process, and the process
is absorbed on hitting the endpoints {0, 1} of the interval of concern [0, 1].
For the discretization, let h = 1/N, where N is an integer. The absorbing
points will be deleted, since they do not influence the rate of convergence.
Then, the state space is Sh = {h, ... , 1- h}, a set with N- 1 points.
Let Rh = {rh(x, y), x, y E Sh} denote the transition matrix of the locally
consistent approximating Markov chain which is absorbed on the boundary
and is defined by
0 1/2 o.. .
( 1/2 0 1/2 .. .
R h_
- 0 1/2 o...
.. .. ..
. . .
Since only the role of the initial condition is of concern, set Ch(x) = 0.
Consider the Gauss-Seidel relaxation W~ -t W~+l, defined by
Refer to Figure 6.6, where h = 1/N = 1/50 and the initial condition is the
6. 7 A Multigrid Method 177
0.9
0.8
0.7
solid line. The two dotted lines show the values after 10 and 50 iterations.
Now refer to Figure 6.7, where the initial condition for the same problem
is the oscillatory solid line, and the dotted lines have the same meaning
as above. Note that the algorithm converges much faster here than in Fig-
ure 6.6 for the smoother initial condition. The situation just described is
actually the general case for Markov chain approximations to the Wiener
process with drift. Loosely speaking, the more "oscillatory" the initial con-
dition [i.e., the energy in the initial condition being essentially in the higher
eigenvalues of Rh] for the iteration (7.1), the faster the convergence. When
the initial condition in (7.1) is "smooth," it is reasonable to suppose that
we can get a quite good approximation to the solution of (1.1) by work-
ing with a coarser grid, because the errors due to the "projection" onto
the coarser grid or the "interpolation" of the resulting value back onto the
finer grid would not be "large."
See Figure 6.8, where the problem with the smooth initial condition is
treated on a grid with the coarser spacing 2h. The times required for the
computation for the cases of Figure 6.8 are approximately the same as
those for the cases in Figure 6.6, although the result is much better in
Figure 6.8. The smoother the initial condition, the better the quality of the
approximation on a coarser grid. The above comments provide an intuitive
picture of the role of the computations on the coarser grid as well as of
the smoothing property of the Gauss-Seidel relaxation. This "smoothing"
property provides one important basis for the multigrid method. FUller and
very enlightening discussions appear in the introductory books [16, 120].
The cited "smoothing" property of the Gauss-Seidel relaxation is not
quite shared by the Jacobi relaxation: w:+l = Rhw: + Ch. Consider the
initial condition Wt defined by Wt(x) = sin(k1rx), 1 ~ k < N. Then the
iteration does converge faster as k increases, up to about the "midpoint"
178 6. Computational Methods
x 0
~
·0.2
-0.4
-0.6
-0.8
-I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.9
0.8
0.7
0.6
x 0.5
~
0.4
0.3
k = N /2, after which the convergence slows down again. But the weighted
Jacobi method can be used to get the faster convergence for the more
oscillatory initial conditions, for weights w less than unity (recall that we
used weights greater than unity in Section 6.4, but there the weighting
served a different purpose). See [16, Chapter 2] for a fuller discussion.
The so-called smoothing properties of the Gauss-Seidel and weighted
Jacobi iteration are shared by systems other than the "random walk" dis-
cretization of the Wiener process given above. A fuller discussion is beyond
our scope here. But we note that multigrid methods of the type discussed in
the next subsection seem to work well on all of the problem classes to which
they were applied, including the ergodic, heavy traffic and singular control
problems of Chapters 7 and 8, even when their use could not be rigorously
6. 7 A Multigrid Method 179
6. 7. 2 A multigrid method
We will be concerned with the solution of the equation (1.1) on the state
space Sh, and we rewrite (1.1) here for convenience
(7.2)
(7.3)
ending up with the final value which will be denoted by Wf. Next, define
the residual P3 via a Jacobi (or Gauss-Seidel) relaxation
(7.4)
and define the error oWf = Wh(u)- Wf. Then we can write
If (7 .5) can be solved for oWf, then the solution to (7.2) is given by
(7.6)
Thus, J~h acting on the function p( ·) simply picks out the values for the
points which are common to both Sh and S2h· This is perhaps the crudest
projection. See further comments at the end of the section.
6.7 A Multigrid Method 181
{a) Given the initial guess Wcf of the solution to {7.2), do n1 accelerated
Gauss-Seidel iterations, getting an updated estimate of Wh(u) which will
be called Wf. Solve for the residual paby {7.4).
{b) Then, starting with initial condition 6X 2 h = 0, get an approximate
solution oW?h (say, via n2 accelerated Gauss-Seidel iterations) to the equa-
tion
(7.7)
on s2h·
(c) Next, given oW[h, we need to interpolate it "back to the finer" grid
Sh. To do this, define an interpolation operator I~h which takes functions
defined on S2h into functions defined on Sh. One natural choice is the
following: Let p be defined on the points of s2h· For X E s2h, set (I~hp)(x) =
p(x). For X E sh - s2h. define (I~hp)(x) to be the linear interpolation of
the smallest set of points in s2h in whose convex hull X lies. This is perhaps
the simplest type of interpolation operator. The new trial solution to (7.2)
is now defined by the updated Wf
p~ = Tj,u(w:+l,Ch(u))- w:+l.
3. Define {/,h = J~hp~, and starting with the initial value oX 2h = 0, do
n2 iterations of
182 6. Computational Methods
J,u u n+1> 1n
2h
Pn =
r2h ( •w2h 2h) -u•w2h
n+l·
5. Define J~h = J:j~ p;h, and starting with the initial value t5X 4h = 0, do
n3 iterations of
u
•whk-1
n+1 ---+ u•wh•-1
n+1
+ 1hkh•-1 u•whk
n+t·
Reset the value of t5W:+1 as w:+l ---+ w:+l + 1~2 t5WJi 1 , and with this
new initial condition, do n 2 iterations of
oscillate slightly. But, even then the asymptotic errors have been quite small
for the problems that have been tried by the authors. In any case, much
more experimentation is needed on the great variety of stochastic control
problems.
Due to the crudity of the projection operator, as well as to the poorly
understood structure of the general problems of interest, we generally used
several accelerated Gauss-Seidel relaxations between each level of projec-
tion/interpolation. Thus these relaxations actually contribute to the con-
vergence via their contraction as well as by their smoothing property. In
much multigrid practice, acceleration is not used, and one uses fewer relax-
ations, as well as more sophisticated schemes such as the W -cycle. It was
pointed out in [64] that the smoothing property of the accelerated Gauss-
Seidel procedure deteriorates as the acceleration parameter increases to its
optimal value.
The projection operator I~=- 1 described above is the simplest possible.
It is not a "smooth" operator, and the success of its use is increased with
added smoothing relaxations. A common alternative is to define the projec-
tion I~=- 1 phk (x) to be the arithmetic mean of the phk (y) for points y which
can be reached in one step (including diagonals) from x. Alternatively, de-
fine the interpolation operator first and then define the projection operator
to be its transpose. The Rhk on the coarser grids can also be chosen in
a more sophisticated manner. See the references for more detail. We also
direct the reader's attention to the algebraic multigrid method [119], where
the interpolation and projection operators (as well as the coarser grids) are
tailored to the actual values of the transition probabilities.
stopping was discussed in [90]. Problems with reflected diffusion models and
constraints were dealt with in [106] and [153], and will not be discussed here.
Other references which have dealt with LP formulations of Markov chain
optimization problems are [38] and [81]. Only a few basic facts concerning
linear programming will be stated, and the reader is referred to any of the
many standard references (e.g., [9, 61]) for more detail.
The Basic LP. For a column vector b, row vector c, and a matrix A,
all of compatible dimensions, the basic LP form is the following. Choose
X= {X1, ... , Xq} satisfying
AX=b, X ~0 (8.1)
Let Ai denote the i-th column of A. Define the row vector D = c- Y' A.
In terms of the components, Di = Ci- Y' Ai. The so-called complementary
slackness condition is
(8.3)
We say that a vector X ~ 0 (respectively, Y) is primal (respectively, dual)
feasible if AX= b (respectively, Y' A:::; c). A well known fact in LP is that
a vector X is optimal if and only if it is primal feasible, and there is a dual
feasible Y such that complementary slackness holds.
(8.4)
BXB + (NB)XNB b.
We wish to replace the current basis by one with a smaller cost, if possible.
To help us see what needs to be done, the equation (8.4) will be transformed
into a form which allows us to observe directly the derivatives of the cost
with respect to each of the basic and nonbasic variables. Let us subtract
a linear combination of the rows of the second equation of (8.4) from the
first equation such that the term cBXB is canceled. The multipliers in the
linear combination are easily seen to be Y' = CBB- 1 • Then rewrite (8.4)
as
(O)XB + [cNB- Y'(NB)]XNB + Y'b = z, (8.5)
XB + B- 1 (NB)XNB = B- 1 b. (8.6)
Because XNB = 0, (8.5) implies that z = cBB- 1b, the cost under the
current basis. The multiplier vector Y is not necessarily dual feasible.
At the current basic solution X, equation (8.5) implies that Di = ci-
Y' Ai is the derivative of the cost with respect to the variable Xi. Since
Di = 0 if Xi is a basic variable, the derivative of the cost (at the current
vector X) with respect to a basic variable is zero. Note that if Di 2: 0 for
all i, then Y is dual feasible, and complementary slackness holds. In the
actual simplex procedure, B- 1 is not actually explicitly computed anew
at each update of the basis. It is calculated in a relatively simple manner
using the previous basis inverse.
If Di 2: 0 for all i, then the current solution is optimal, and the proce-
dure can be stopped. Otherwise, it is possible to select an improved basic
solution. To do this, define an index io by
minD·=
. 3 D·to
3
and introduce the nonbasic variable Xio into the basis at the maximal
level consistent with the preservation of feasibility. This will involve the
elimination of one of the current basic variables from the basis. Note that
(8.7)
It can be seen from (8.7) that as Xio increases from the value zero, at least
some of the components of the vector XB might have to change value to
assure feasibility. There are three possibilities. Case (a): Xio can increase
without bound, with the cost going to -oo. This would occur if none of the
components of B- 1 Aio were positive. In this case, stop. Case (b): It might
186 6. Computational Methods
be impossible to raise xio above zero, without driving some other basic
variable Xi 1 (which would have to have been at the zero level) negative.
In this case, simply replace Xh by Xio in the basis. Case (c): Xio can be
increased to a finite but nonzero level before some other basic variable, say
Xi 1 , becomes zero. Then replace Xi 1 by Xio in the basis and set it at the
maximum level. The procedure is then repeated. The exact details of this
so-called "pivoting" procedure are not important for the discussion here.
The method is not quite a gradient procedure, since it attempts to get a
decrease in cost by increasing the value of only a single nonbasic variable
at a time, with the other variables changing their values only to the extent
necessary to maintain feasibility.
Suppose that the initial state eo is a random variable with the probability
distribution
P{eo = i} =Pi> 0, for alliES.
It will turn out that the values of Pi do not affect the optimal control,
unless there are added constraints [105). Let Ep denote the expectation of
functionals of the Markov chain which is associated with whatever R(u)
is used and the distribution p of the initial condition eo. {The u(·) will be
6.8 Linear Programming 187
clear from the context.) For i E S and k ~ L, let Mik denote the joint
(state, control) mean occurrence times defined by
00
Thus, Mik is the mean number of times that the state is i and the action is
ak simultaneously. We emphasize that these mean values are for the chain
with the degenerate transition matrix R(u). The probability that action ak
is used when the state is i is
{8.8)
j,k {8.9)
which is a linear function of the Mik· The constraints (8.9) and cost (8.10)
constitute the LP formulation of the optimization problem. Additional con-
straints of the form
L dikjMik ~ q;
i,k
can be added.
value of k [to be called k( i)] for which Mik > 0. Otherwise, more than m
of the variables Mik would be positive. Thus, any basic feasible solution
yields a pure Markov control: There is no randomization. If additional lin-
ear constraints were added to (8.9), then the basic solution might contain
randomized controls for some of the states.
Suppose that u( ·) is a control which is determined by the probability law
bik}· That is,
The Dual LP. Let Yi denote the dual variables for the LP (8.9), (8.10).
Then, by analogy to the relation between (8.1) and (8.2), we see that the
dual equations to (8.9) can be written as
which is just the dynamic programming equation {1.4). Thus, the optimal
dual variables are the minimum cost values: Yi = V(i).
Let Mik(i), i E S, denote the current set of basic variables. Then Dik(i) =0
for all i E S. By examining the set of linear equations
we see that they are the same as (1.3) for the current control u(i) = k(i).
This implies that Yi = W(i,u). If Dik ~ 0 for all i and k, then stop, since
optimality has been achieved. Otherwise, define (io, ko) by
Then the new control for state io will be ako . The controls for the other
states i =f. io remain as before. The pivoting procedure calculates the cost
for the new control. Thus, approximation in policy space is equivalent to
the simplex procedure if the control for only one state (the one yielding
the most negative derivative of the cost) is changed on each policy update
(as opposed to the possibility of changing the controls for all of the states
simultaneously). See [105].
7
The Ergodic Cost Problem:
Formulation and Algorithms
approximation in policy space algorithms for the case where there is a sin-
gle ergodic class under each feedback control. The matrices P(u) which
appear in these algorithms are not contractions. To formulate the numer-
ical algorithms, a centered form of the transition matrix is introduced in
Section 7.3. This centered form enjoys the contraction property. The appm-
priate adaptations of the numerical algorithms of Chapter 5 are discussed
in Section 7.4.
In Section 7.5, we return to the ergodic cost problem for the approximat-
ing chain. The appropriate form of the cost function for the approximating
Markov chain is given together with a heuristic discussion of why it is
the correct one. The dynamic programming equation for the optimal value
function is also stated. Owing to the possibility of non constant interpo-
lation intervals, and to our desire that the limits of the cost functions for
the chains approximate that for the original diffusion or jump diffusion,
the formulation is slightly different than the one used for the Markov chain
model of Sections 7.1-7.4. In Section 7.6, it is shown that the dynamic
programming equation given in Section 7.5 is also the correct one for the
analogous ergodic cost problem for the continuous parameter Markov chain
interpolation 1/Jh(·) introduced in Chapter 4.
The main difficulty in directly applying the computational methods of
Chapter 5 to the approximating chain and the ergodic cost problem stems
from the fact that the interpolation interval might depend on either the
state or the control. With this dependence, the dynamic programming
equation for the ergodic cost problem cannot usually be easily solved by
a recursive procedure. In Section 7.7, we review the procedure for getting
chains with constant interpolation intervals from general approximating
chains and discuss the computational consequences.
In Sections 7.6 and 7.7, we suppose that there is neither a cost nor a
control acting on the boundary (the boundary must be reflecting and not
absorbing). Section 7.8 gives the few changes that are needed for a more
general case.
Al.l. For each control, the state space consists of tmnsient states plus a
single communicating aperiodic class.
7.1 The Control Problem for the Markov Chain: Formulation 193
A1.2. C(x, ·) and p(x, yi·) are continuous functions of the control param-
eter.
If the lim sup is not a function of the initial condition, then it will be
written as -y(u). Assumption (A1.1) does cover many applications, and it
simplifies the development considerably. It does not hold in all cases of
interest for the approximating Markov chains. For example, in the singular
control problem of Chapter 8, one can easily construct controls for which
the assumption is violated. Nevertheless, the algorithms seem to perform
well for the "usual" problems on which we have worked. Weaker conditions
are used in Theorem 1.1, under a finiteness condition on U.
The Functional Equation for the Value Function for a Given Con-
trol. Let us recapitulate some of the statements in Subsection 2.1.3. By
(A1.1), for each feedback control u(·), there is a unique invariant measure
which will be denoted by the row vector 1t'(u) = {1l'(x, u), xES}, and (1.1)
does not depend on x. As noted in Subsection 2.1.3, there is a vector val-
ued function W(u) with values {W(x,u),x E S}, such that (W(x,u),-y(u))
satisfy
satisfies (1.2).
The solution to (1.2) is not unique, since if (W(u), -y(u)) is a solution and
k a constant, then the function defined by W(x, u) + k together with the
same -y( u) is also a solution. However, it is true that if we normalize the
set of possible solutions by selecting some state xo E S and insisting that
W(xo, u) = K, for some given constant K, then the solution (W(u},-y(u))
is unique. Generally, K is chosen to equal either zero or -y( u).
Let (W, i) satisfy
Then .:Y = 'Y(u). To see this, multiply each side of (1.4) on the left by ?l'(x, u),
sum over x, and use the fact that ?l'(u)P(u) = ?l'(u).
(ii): (A1.2) holds and S is a single recurrent and aperiodic class for each
feedback control.
(iii): (A1.1) and (A1.2) hold.
Then, there exists a unique solution 1 to (1.5).
':i'he proof under (i) is [11, Proposition 4, Chapter 7]. The proof under
(iii) is Theorem 8.12 in [126]. The proof under the stronger condition (ii)
is in Theorem 3.1 below.
Theorem 1.2. Assume (A1.2) and suppose that there is a solution (V, .:Y)
to
Then there is a feedback control u( ·) such that .:Y = 'Y( u). Consider any
sequence u = {uo(·),ul(·), ... } of feedback controls. Then
The result remains true for any admissible control sequence. Thus .:Y = 1·
Proof. Suppose that there are V and .:Y satisfying (1.6). Let the minimum
be taken on at u(x) and let u(·) be another feedback control. Then
Now, let u = {u0 ( ·), u 1 ( ·), ... } be any sequence of feedback controls.
Then, iterating (1.6') yields
n
V::; P(uo)P(ui) ... P(un)V + L P(uo) · · · P(ui_!)[C(ui)- e-t].
i=O
which proves the optimality with respect to all sequences of feedback con-
trols. The additional details needed to get the result for an arbitrary ad-
missible control sequence will not be given. See [88, Section 6.6], or [131] .
•
A Centered Form of the Dynamical Equations. There are alter-
native "centered" forms of (1.2) and (1.6) which will be useful in the
next few sections. Suppose that, for a feedback control u(·), we normal-
ize the function W(u) in (1.2) by choosing a particular state xo and set-
ting W(xo,u) = 'Y(u). Define the lSI-dimensional column vector C(u) =
{C(x,u(x)), xES}. Then we can write (1.2) as
N
W(x, N, u) = E; L C(~i' u(~i)). (2.1)
i=O
where V(x, 0) = 0. For large values of N, the cost function (2.1) differs from
(1.1) mainly in the normalization by N. Thus it is intuitively reasonable
to expect that the optimal control for the ergodic problem will be well
approximated by that for the finite time problem if N is large, and that
the normalized cost function V(x, N)/N will converge to 1 as N --+ oo.
Under appropriate conditions this does happen, and we now give a more
precise statement of the result.
Let {unO, n < oo} be a sequence of admissible feedback controls, where
ui(·) is to be used at time i. Define the (n + 1)-step probability transition
matrix
{pn(x, yluo, ... , Un), x, yES}= P(uo) · · · P(un)·
A2.1. There is a state xo, an n 0 < oo, and an Eo > 0 such that for all
uo(·), ... , UnaO and all xES
Note the similarity between (2.2) and (2.3). Equation (2.3) is essentially
(2.2) with a centering or normalization at each time step in order to keep
the cost from blowing up.
Remarks. The result under condition (i) is due to White ([11],[88, Section
6.5], [126, Theorem 8.18], [154]) and that under (ii) is in [53]. See the survey
[126] for additional information and references. Note that the algorithm
defined by (2.3) is a Jacobi type iteration. There does not seem to be a
Gauss-Seidel version known at this time, although there is for the centered
form introduced below. Because the approximation in policy space type
methods seem to be preferable for most problems, (2.3) is not widely used.
or, alternatively, by
U.+t (z) ~ arg ~ [ ~>(z, Yl<> )w(y, u.) + C(x, a)]· (3.2)
We can now state the following theorem. Other results are in the references.
Theorem 3.1. Assume (A1.2), and that Sis a single recurrent and aperi-
odic class under each feedback control. Then there is i such that 'Y(un) .!. ;y,
and there is a V such that (V,,:Y) satisfy (1.6), or equivalently, (1.8).
some sequence {Un (·)} the absolute values of the sequence of subdominant
eigenvalues of P(un) converges to one. We can suppose (by choosing a
subsequence if necessary) that un(-) converges to a control u(·). Then P(u)
has at least two eigenvalues whose norms equal one, a contradiction to the
condition that there is a single recurrent class under u(·). It follows from
the assertion that the W(u) defined by (1.3) are bounded uniformly in u(·).
Let nk ---+ oo be a sequence such that Un~o ( ·), Un~o +1 ( ·), W (Un~o), respec-
tively, converge to limits u(·),u(·), W, respectively. We have
and
ei = (P(u)- I)W + C(u),
which implies that 1' = -y(u). On the other hand, by (3.1)
min [(P(u)- + C(u)]
I)W(un~o)
u(x)EU
[(P(un~o+l)- I)W(un~o) + C(un~o+I)].
Thus
min [(P(u)- I)W + C(u)]
u(x)EU
[(P(u)- I)W + C(u)].
Because there are no transient states by hypothesis, 7r(X, u) > 0 for all
xES. Now multiply the left and right sides by 1r(u) to get ;y = -y(u) ~ -y(u),
where the inequality is strict unless (*) is an equality. But 'Y(u) = 'Y(u) =
lim'Y(un), which implies that(*) is an equality. •
n
Note that the xo-th rows of Pe(u) and Ce(u) are zero. Given the value
of w(u), both W(u) and 'Y(u) can be calculated. The key property of the
matrix Pe (u) is the fact that its spectral radius is less than unity, so that the
iterative algorithms of Chapter 6 can be used for the solution of (3.4). This
centering idea seems to have originated in [12] in their work on aggregation
algorithms for the ergodic problem.
Lemma 3.2. If the chain under the feedback control u( ·) contains only
transient states and a single aperiodic recurrent class, then the spectral
radius of Pe (u) is less than unity.
dx=cxdt+dw, c>O
on the interval [0, 1], with instantaneously reflecting boundaries. The state
space for the Markov chain approximation is {h, 2h, ... , 1 - h}, where 1/ h
is assumed to be an integer. Let ch :::; ~. and define x 0 = noh for some
integer n0 • Define the transition probabilities ph(x, y) by the "central finite
difference formula" (see Section 5.1)
1 +chx
p(x,x +h)= 2 , x = h, ... , 1- 2h,
1-chx
p(x,x- h)= 2 , x = 2h, ... , 1- h,
7.5 The Control Problem 201
given. Assumptions (ALl) and (A1.2) will be used for the approximating
chains for the values of h of interest. In this section, we let the interpolation
interval 6.th(x, a) depend on the state and control in order to get a better
understanding of the relationship between the approximating chain and the
original diffusion process. It will be seen later that the approximation in
policy space algorithm requires that the intervals not depend on either the
state or control. In Section 7.7, the transition probabilities will be modified
so that they are not state dependent.
Suppose the setup for a reflecting jump diffusion which was given in
Subsection 5.8.3. The process x( ·) is constrained to stay in a compact set
C which is the closure of its interior. Ch denotes the set of states for the
approximating chain in C, and act denotes the reflecting boundary for the
chain, which is disjoint from Ch. Define the state space Sh = Ch U act.
Recall that 6.th(x,a) = 0, for X E aCt. Recall the notation: u~ = u(e~),
Llt~ = Llth (e~, u~), and t~ = L:~,:-01 Llt~.
It will be seen below that the appropriate cost function for the Markov
chain, under an admissible control sequence u = {uo(·), u1(·), ... }, is
There can also be costs associated with being on the reflecting boundary,
and an example will be given in Chapter 8. See also Section 7.8. In this
introductory discussion, we wish to keep the formulation simple. If (5.1)
does not depend on the initial condition, then we write it as 'Yh(u). Suppose
that the feedback control u(·) is used at each time step and let rrh(u) =
{ 1l"h ( x, u)' X E sh} denote the associated invariant measure. The ergodic
theorem for Markov chains [23] yields
satisfies (5.3). As was the case in Section 7.1, the solution to (5.3) is unique
up to an additive constant on the function Wh(u). Let (Wh,;yh) satisfy
(5.7)
Note that the value is zero for the "instantaneous" reflecting states. Now,
(5.2) can be written in the simpler form
(5.8)
X
Recall the definition of the interpolation ~h(·) from Chapter 4, and define
the interpolation uh(·) here by: uh(t) = u~ fortE [t~,t~+l). The ergodic
theorem for Markov chains also implies the following identities (all with
probability one)
(5.9)
204 7. The Ergodic Cost Problem
The numerical methods for the latter calculation converge much faster than
do the numerical methods which might be used for the calculation of the
invariant measure itself. Because of this, even if one is interested in the
invariant measure of x( ·), we might content ourselves with the values of
approximations to "stationary expectations" of a small number of func-
tions.
An interesting alternative for numerically approximating the invariant
measure for certain classes of "heavy traffic" problems is the QNET method
of [33, 35, 36, 65], although it is not applicable to state dependent or control
problems.
The fact that the infimum ;yh of the costs (together with an auxiliary
function Vh) satisfies (5.10) as well as the convergence of the approximation
in policy space algorithm can be shown by the methods used in Theorems
1.2 and 3.1. Here we will show only that (5.10) is the correct dynamic
programming equation. Suppose that there are Vh and ih satisfying
7.5 The Control Problem 205
for all X E sh. Let the minimum be taken on at u(x) and let u(·) be
another feedback control. Then we claim that ;:yh = 'Yh(u) which equals the
minimum cost. Following the method used to deal with (1.6),
The first equality implies that ;:yh = 'Yh(u), as shown in connection with
(1.4). Next, multiply the left and right sides of the above inequality by
1rh(x,u), sum over x, and use the fact that 1rh(u) is invariant for the tran-
sition matrix ph(u) to get that
X X
But this last inequality implies that 'Yh(u) 2: ;:yh = 'Yh(u), which proves the
claim.
Now repeat (5.12) for Ch(x, u(x)) = flth(x, u(x)) yielding the new value
(5.15)
X
206 7. The Ergodic Cost Problem
Divide (5.14) by (5.15) to get 'Yh (u). So we can calculate the cost under
each control u(·). Nevertheless, to use the approximation in policy space
method, we need to use a constant interpolation interval, for x E Gh. See
Section 7. 7.
which is just J.Lh(x,u). Hence, J.Lh(·) is an invariant measure for '1/Jh(·) under
control u(·). It is unique if there is a unique invariant measure for{~~' n <
oo} under u(·).
The mean cost per unit time for '1/Jh(·) under u(·) can be written as
limE;~
t
t
t lo
k('lj;h(s),u('lj;h(s)))ds = jk(x,u(x))J.Lh(dx,u) = 'Yh(u). (6.2)
(6.3)
where E;h(u) is the expectation for the stationary process, under the control
u(·). Again, we note that boundary costs can be added. See Section 7.8 and
Chapter 8. Let us note the following without proof, because it partially
justifies considering the process '1/Jh(·): The equivalence of (6.2) and (6.3)
and a weak convergence argument can be used to show that the weak limits
of the invariant measures J.Lh(u) are invariant measures for the original
process x(·). See Chapter 11, where convergence results for the ergodic
cost problem will be given.
The interpolation '1/Jh (·) is useful for the mathematical convergence anal-
ysis. But the computational algorithm is the same as that for the discrete
7.7 Computations 207
parameter chain. The recursive equations for the average cost per unit time
under any given feedback control u( ·) are the same for the discrete and the
continuous parameter chains. By comparing (6.2} and (6.3) with (5.1) and
(5.2), it is evident that the minimum of (5.1) for the discrete parameter
chain equals the minimum of
limsup.E;-
t
11t
t 0
k(,Ph(s),u(s))ds
states are instantaneous and their interpolation intervals are zero, they will
be (easily) dealt with separately in Section 7.7.2 below.
X2dt,
(7.1)
(a1x1 + a2x2)dt + udt + adw,
where a 2 ;::: 1, lu(t)l ~ 1. Let ai ~ 0, so that the system (7.1) is stable. For
this illustrative example, the state space for x( ·) will be the closed "square"
G = [-B, B] x [-B, B], for some B > 0, and with a reflecting boundary.
It is sufficient for our purposes to let Gh be the regular h-grid (lR'h n G)
on G. Let 8Gt, the reflecting boundary for the chain, be the points in
lR'h- G which are at most a distance h from G in any coordinate direction.
The procedure is as follows: First, we obtain any locally consistent chain,
using any of the methods of Chapter 5. If the interpolation intervals are
not constant, we then use the ideas of Section 5.2 to eliminate the state
and control dependence.
For the first step for this case, and in order to illustrate one concrete
example, let us follow the "operator splitting" procedure outlined in Sub-
section 5.2.3. Write x = (x 1 ,x2 ). The differential operator of the process
defined by (7.1) is
a 2 82 8 8 8
-2 8 2 + x2-
X2 8 Xt
8 X2 + (a1x1 + a2x2)-
+ u(x)- 8 X2 . (7.2)
Define
h( hi ) _ nh(x,x ± e1hia)
p x,x±e1 a - Qh(x) ,
where
and
h( ± hi ) _ nh(x,x ± e2hia)
p X, X e2 a - Qh (X) ,
where
nh(x, x ± e2hia) = a 2/2 + ha/2 + h(a1x1 + a2x2)±,
h h2
/).t (x) = Qh(x).
The transition probabilities from x to all nonlisted y are zero. The above
constructed transition probabilities are locally consistent with the diffusion
model (7.1).
Next, let us construct transition probabilities where the interpolation
intervals are constant. This is easily done as follows, using the ph(x, yia)
given above. Define
-h(
p x, x
± e2 hi a ) = nh(x,xQh
± e2hia) , (7.3)
-h Qh(x)
p (x,xia) = 1- ---zih'
i:h h h2
/).t (x) = /).t = Qh.
The chain with these transition probabilities and interpolation interval is
also locally consistent with the diffusion (7.1). The transition probabilities
(7.3) are used for x E Gh. The transition probabilities for the reflecting
states aat are assumed to be locally consistent with the reflection direc-
tions as in Section 5. 7.
Now, for a feedback control u(·), define the transition matrix for the "re-
duced" chain on Ch by
where
Ch(x,u(x)) = k(x,u(x))l:l.th, x E Ch,
Wh(u) = {Wh(x,u), x E Ch},
since we supposed that there is no cost on the boundary. Now follow the
procedure which led to (3.4). Choose the centering value Wh(xo, u) to
satisfy
h( ) Wh(xo,u)
'Y u = fl.th . (7.4)
Then
7.7 Computations 211
We will write the algorithm in terms of the ph(x, yja), since that is the
starting data. Equation (7.8) will be rewritten in terms of ph(x, yia) to get
the final result {7.11) and (7.12).
212 7. The Ergodic Cost Problem
+ [k(x,u(x))-k(xo,u(xo))] ~.
where Qh = SUPx,a Qh(x, a). As will be seen below, it is not necessary to
calculate Qh.
h( ) ""nh(x,yJu(x))- nh(xo,yJu(xo)) h( )
w x, u = ~ Qh(x, u(x)) + nh(xo, xJu(xo)) w y, u
y# (7.11)
h2
+ [k(x,u(x))- k(xo,u(xo))] I (Xo ))"
Qh( X,U (X)) +nh( Xo,XU
Equation (5.3) continues to hold for X E ch. Let JJ'l(x, yia), X E act'
denote the controlled transition probabilities on the reflecting states. For
X E act, (5.4) is replaced by
The function defined by {5.5) continues to satisfy (5.3) and {8.2) if the cost
k(e~, u~)~t~ is replaced by
k(e~. u~)~t~ + ko(e~. u~)h.
Now let use the terminology of Section 7.7 above. Equations (7.8), (7.9)
and (7.11) continue to hold for x E Ch. Recall that xo does not communi-
cate with states on act by assumption. For X E act, (7.12) is replaced
by
wh(x, u) = LPh(x, ylu(x))wh(y, u) + ko(x, u(x))h. (8.3)
y
In fact, suppose that (7.8) and (8.3) hold. Recall that by (7.5b), and
USing the fact that Xo does not communicate to act,
Let 1fh (u) = { 1fh (X, u), X E Gh u act} be the invariant measure for the
chain {~~, n < oo} on the extended state space. Multiplying (7.8) and (8.3)
by nh (x, u) appropriately, adding, and using the invariance of nh (u) as in
Section 7.5 yields
Wh(xo,u)
_ l:xEGh nh(x,u)k(x,u(x))f:!.th + l:xEaG~ nh(x,u)ko(x,u(x))h
- l:xEGh nh(x, u)
(8.5)
Because l:xEGh nh(x, u)f:!.th equals the mean interpolation interval in G,
which is just f:!.th, the cost (8.1) equals Wh(x 0,u)jf:!.th, as previously.
Many of the process models which are used for purposes of analysis or con-
trol are approximations to the true physical model. Perhaps the dimension
of the actual physical model is very high, or it might be difficult to define a
manageable controlled dynamical (Markov) system model which describes
well the quantities of basic interest. Sometimes the sheer size of the problem
and the nature of the interactions of the component effects allows a good
approximation to be made, in the sense that some form of the central limit
theorem might be used to "summarize" or "aggregate" many of the ran-
dom influences and provide a good description of the quantities of interest.
Because these simpler or aggregate models will be used in an optimiza-
tion problem, we need to be sure that optimal or nearly optimal controls
(and the minimum value function, respectively) for the aggregated prob-
lem will also be nearly optimal for the actual physical problem (and a good
approximation to the associated minimum value function, respectively).
This chapter is concerned with two classes of problems where this av-
eraging effect can be used to simplify the model. The first class, the so-
called class of heavy tmffic problems, originated in the study of uncontrolled
queueing systems [74, 113] and has applications to a broad class of such
systems which include certain communication and computer networks and
manufacturing systems. For these systems, "traffic" is heavy in the sense
that at some processors there is little idle time. The distributions of the
service and interarrival times might be dependent on the system state. The
dimension of the physical problem is usually enormous. With an appro-
priate scaling, a functional central limit theorem argument can be used to
show that the basic elements of the system can be well approximated by
216 8. Heavy Traffic and Singular Control
Ad7rl - Aa7ro,
.Xa7ri-l + Ad7rHI - (.Xa + .Xd)7ri, i =f. 0, N, (1.1)
Aa1rN -1 - Ad1rN ·
The system (1.1) is one of the few queueing system equations that can be
solved. Nevertheless, it is still quite difficult to calculate the distributions
at finite times for large N. The situation is considerably worse if we allow
the distributions of the service or interarrival times to be other than expo-
nential (even if we are only concerned with the stationary distributions),
or if services or arrivals can occur in batches. The server can break down or
be otherwise unavailable at random. If the arrival or service rates depend
on the state of the system (e.g., faster service or slower arrivals for longer
queues), then the analytic solution of the counterpart of (1.1) can be ob-
tained at present only in a few special cases, and even numerical solutions
are generally difficult to get if N is not small. The required computation
rapidly gets out of bounds if the system is a network of interconnected
queues (except for the stationary solutions to the so-called Jackson cases).
If a control is added to (1.1) (e.g., a control on the service rate), then the
resulting control problem has a very high dimensional state space, even
under the "exponential" distribution. One is strongly tempted to use some
sort of approximation method to simplify the model for the queueing pro-
cess. Such approximations normally require that "certain parameters" be
either small or large. It was recognized in the late 1960's [74, 113] that if the
service and arrival rates are close to each other, then by a suitable scaling,
such a simplifying approximation can be obtained, at least for simple sys-
tems, and the approximating process was a reflected Wiener process with
drift. Subsequently, the same type of result was shown to be true in a fairly
general setting [129] and for controlled problems as well [107, 118]. See [100]
for a comprehensive development and many examples. The "aggregated" or
limit models often yield excellent approximations for the physical systems
under realistic conditions [34, 36, 68, 130]. A further motivation for seeking
aggregative or simplified models is that some sort of stochastic evolution
equation and a Markov model are very helpful if the control problem is to
be treated.
Let us work in discrete time and suppose that only one arrival or depar-
ture event can occur at each discrete time. Since we are concerned with
an approximation result, we will consider a family of queueing problems
parameterized by e, with the probability of an arrival at any time and the
probability of completion of any service, conditioned on all past data, be-
ing A+ bav'f. and A+ bdv'f., respectively, where the ba and bd can be either
positive or negative. With these values, the arrival and service rates are
within 0( v'f.) of one another. Marginal differences of the order of v'f. in the
rates can make a considerable difference in the queue occupancy statistics.
As the traffic intensity increases (i.e., as e ~ 0), the mean occupancies
increase, and it makes sense to scale the buffer size also. We let the buffer
size be scaled as Buffer Size = B / v'f., for some B > 0. If the buffer is of
smaller order, then the limit is concentrated at zero, and if it is of larger
order, it plays no role in the limit.
A simplifying convention concerning departures. Let Q~ denote the num-
ber of customers waiting for or in service at discrete time n. There is a
convention which is used in writing the evolution equations which simpli-
fies the analysis considerably. In our current example, it is clear that the
arrival process is independent of the queue size and departure process, but
the departure process does depend on the state of the queue. In particular,
if the queue is empty then there can be no departure. The convention to be
used in writing the evolution equation is that even if the queue is empty,
the processor will keep working and sending out outputs at the usual rate.
But to keep the equations correct, a correction or "reflection" term (the
dY terms below) will be subtracted from the departure process whenever
such a "fictitious" output occurs. This device was used by [69, 74, 129] and
others. While it is not needed [100, Chapter 5], it makes our discussion of
the motivational example a little simpler. If there is an arrival in the midst
of such a "fictitious" interval, it is then supposed that the service time of
this arrival is just the residual service time for the current service interval.
This convention does not affect the form of the limit [74, 100, 107].
where the d~ and 6.D~, respectively, are the number (zero or one) of
arrivals or departures, respectively, at time m. Keeping in mind the above
convention concerning departures and fictitious departures, the dD~ are
the indicators of departure events assuming that the queue is never empty.
The I~£ corrects for a "fictitious" output at time m if the queue is empty
but a ''fictitious" output occurs at that time. The term I:;( subtracts any
input which arrives at time m if the buffer is full at that time.
8.1 Motivating Examples 219
and similarly for ~~f. Thus, the partial sums of the { r::t, m < oo} form a
martingale sequence for a equal to a or d. Note that
Let [t/t:] denote the integer part of tf«:. Define the continuous parameter
scaled queue length process Xf(·) by
Xf(t) = v'fQ[t/E]•
Define LlY~ = .fi-J~E and LlUin = .fi-I:r(. Then, letting the ratio t/«:
henceforth denote the integer part only, we can write
t/E-1 t/E-1
For motivational purposes, note first that the first two sums in (1.5)
tend to normally distributed random variables with mean zero and variance
A(l- A)t as«:--+ 0. More generally, when considered as functions oft, they
converge weakly to mutually independent Wiener processes wa (-), wd( ·),
each with variance A(l- A)t. If Xf(O) converges weakly (i.e., in distribution;
see Chapter 9 for the definitions) to a random variable X(O) as f--+ 0, then
the sequence of processes defined in (1.5) converges weakly to limits that
satisfy
we(x, u) = {1.7)
n=O n=O
where [3 > 0. This cost weighs the loss of a customer heavily relative to the
cost of control or the cost of the waiting time. Define ve(x) = infu We(x, u)
where the infimum is over all admissible controls. The appropriate con-
trolled form of (1.6) and the associated cost are
W(x,u) = E; 100
e-13tk(X(t),u(t))dt + E; 100
e-11tdU(t). {1.9)
Let V(x) = infu W(x, u), where the infimum is over all "admissible" con-
trols for {1.8). Loosely speaking, what we mean by admissible (see Chapters
8.1 Motivating Examples 221
1 or 9 for a fuller discussion) is that the controls satisfy IUa (t) I :::; Ua for o:
equal to a or d, and the ua(t) are independent of the future of the Wiener
processes in the sense that Ua(t) is independent of {wa(t+s) -wa(t), Wd(t+
s)- wd(t), s;::: 0}. It can be shown that v~(x)---+ V(x), as E---+ 0, and that
continuous "nearly optimal" controls for the limit problem are also "nearly
optimal" when used on the physical problem. See similar results for related
but more complex problems are in [3, 4, 94, 86, 100, 103, 107, 118].
These representations and approximations (1.8) and (1.9) hold under
quite broad conditions, as can be seen from the references. The interarrival
or service intervals need not be exponentially or geometrically distributed,
and only the first and second moments of their distributions appear in the
limit equation. "Batch" arrivals and services can also be introduced. Our
aim here is to motivate the limit system equations only, and the reader is
referred to the literature for further details. The numerical problem consists
of computing an approximation to the optimal cost V(x) and associated
control (and then applying the obtained control to the physical problem).
Note that the cost structure is different from that used for the problems in
Chapter 3, because the process U(·), which is included in the cost, is not
differentiable.
t/(-1 t/(-1
Let the cost be (1.7) with k16.U:r,_ +k26.F:r,_ replacing D.U:r,. The function
t/E-1
F£(t) = L 6.F!
m=O
The Input-Output Equations. The mass balance equation for this sys-
tem [analogous to (1.2)] is
n-1
Q~€ = Q~€ + L (arrivals to ~ from the exterior at time m)
m=O
n-1
+L L (arrivals to Pi from Pi at time m)
j#.Om=O
n-1
- L L (departures from Pi to Pi at time m)
j m=O
n-1
+L L (corrections for fictitious departures to Pi from ~ at m)
j m=O
n-1
- L L (corrections for fictitious departures from Pi to Pi at m)
j#.Om=O
n-1
-L (corrections for lost inputs due to a full buffer at Pi at m)
m=O
processors "keep processing" even when there are no customers, and the
fictitious outputs thus created are compensated for by a cancelation or
reflection term, which are the various Y -terms below. Define the scaled
occupancies X~£ = y'EQ~E, and let X~ denote the vector with components
{x~E' i ~ K}. In general, for any sequence {z~' n < 00} define the con-
tinuous time parameter interpolation ZE (.) by Z£ (t) = z~ on the interval
[nf, nf + €). Define ~y~,E = y'f times the indicator of the event that a
fictitious departure occurred at Pi at time m and it was sent to Pi. Let
~A~E and ~D~i,£ be the indicators of the events that there is an external
arrival at Pi and a departure from Pi to Pi at time n, respectively. Let
~U~£ denote the indicator of the event that there is an arrival to Pi at
time n which is rejected due to a full buffer. Rewrite the above equation
with the obvious notation, and where tjf is used to denote the integer part
The Heavy Traffic and Other Assumptions. In keeping with the in-
tuitive idea of heavy traffic, the average total input rate to each processor
from all sources combined will be close to the service rate of that processor.
AB f ---+ 0, the average input and service rates for each processor converge
to each other. The rates will be introduced via their "inverses," the (con-
ditional) means of the interarrival or service intervals. In particular, we
suppose that:
Note the analogue to the form used in Example 2. The gai and gdi are the
8.1 Motivating Examples 225
dominant parts of the rates of arrival (for external arrivals) and service,
respectively, at Pi· We also suppose that the conditional variances of these
random intervals, given the "past data," are also continuous functions of
the current state, modulo an error which goes to zero as f-+ 0. The mathe-
matical proof of convergence also needs the additional assumption that the
set of the squares of these intervals is uniformly integrable inn, f [100).
In order to operate in the heavy traffic environment, it is necessary that
the dominant parts of the mean arrival and service rates be equal for each
processor. This implies that
If (1.13) does not hold, then the scaled queue length process at some pro-
cessor will always be either near zero or at the upper limit for small f. The
relations (1.12) and (1.13) are known as the heavy traffic assumption.
and
yi,£(t) = l:yij,£(t).
#0
Then it can be shown [100, 107, 118) that
(1.14)
+ "small error terms".
The ith component of H£ (t) has the form
226 8. Heavy Traffic and Singular Control
where
M(·) is a stochastic integral with respect to some Wiener process w(·), and
can be written in the form
and the reflection terms Y (·) and U (·) have the properties described above
in connection with (1.14). Also X(·), Y(·) and U(·) are nonanticipative with
respect tow(·). Equation (1.15) describes the limit process which we wished
to motivate. Note that it is in the form of the Skorokhod Problem. N umer-
ical methods for such systems and their controlled forms are of increasing
interest. The Markov chain approximation method is easily adapted for use
on such processes.
Cost Functions for {1.11) and {1.16). Let G denote the state space
[0, B1] x [0, B2] for (1.16). In order to illustrate the scaling which is needed
for the cost function for the physical system (1.11), we introduce one par-
ticular but interesting case. We now restrict attention to a two dimensional
example for notational simplicity. Let j =f. i, and let k(·) be a continuous
function and ki positive numbers. Define
"Li,€m = u"Ai,€1
u m {x~·=B;} + u"Dii,€1
m {x~·=B;,x~·¥0}' (1.17)
L e-m{3€
00
converges weakly to the zero process. Hence, the limit form of (1.18), which
is the cost for {1.16), is
decreases due to lost inputs to I{ when Poi or some Pji, if. 0 is shut off,
increases due to Pi being shut down, but some input is not shut off.
These terms give rise to the impulsive control terms of the limit model.
The directions of the segments of the impulsive controls in the limit model
depend on which combination of links or processors are turned off. See [107]
for more detail.
associated with each of the arrivals to Po, but that this can be changed by
Po with an associated profit or loss. For motivation, consider the following
two particular cases.
WE(x,FE) = E;' 1
00
e-Ptk(XE(t))dt
+ E;' 1 00
e-Pt[k1dU 1•E(t) + k2dU 2•E(t)].
Under the heavy traffic assumptions of the type used in Example 4, (1.20)
can be approximated [118] by the "limit system"
Xi(t) = Xi(O) + Hi(t) + Mi(t) + pii(t)- pii (t) + Yi(t)- Piiyi (t)- Ui(t)
(1.22)
where i -:/: j. The limit cost functional is
W(x,F) = E: 1 e-Ptk(X(t))dt
00
Here the H(·) and M(·) terms take the forms of Example 4, with the
appropriate b(·) and E(·) functions. The term M·) is the (scaled) marginal
difference between the input and service rates, and E(·) depends on the
"randomness" of the arrival and service processes. The term
is the control term for the limit system. The pii ( ·) are processes whose
paths are nonnegative, nondecreasing, and right continuous. They are "sin-
gular" controls, and represent the limits of the reassignments. In this prob-
lem, F 1 (-) = -F2 (·), but that is not necessarily the case in general. Thus,
232 8. Heavy Traffic and Singular Control
An Extension. For use in Section 8.3, let us write the following K -dimen-
sional extension of the model (1.22) and (1.23)
F(t) = L viFi(t),
i=l
00
W(x, F)= E: 1 e-r;tk(X(t))dt
Next subtracting V(x) from each side of {1.24) and formally expanding the
terms yields that
.C0 V(x) + k(x)- IJV(x) 2: 0
v:(x)vi + Qi 2: 0, i = 1, · · · ,K
and at each point x, at least one of the K + 1 terms equals zero. Thus, we
formally have
8.1. 7 Example 7
An interesting problem in portfolio selection which involves a combination
of singular and ordinary control is in [37]. Let x = (x 0 , xt), where x0 2: 0
is the bank account balance and x1 2: 0 the amount invested in stocks. Let
U(t) [respectively, L(t)] denote the total value of stock sales (respectively,
purchases) by time t, and let c(-) be the "consumption rate." There are
transactions costs for sales (respectively, purchases): one pays a fraction
J.t [respectively, ..\] of the transactions amount. "Infinitesimal" transactions
are allowed and the model is
dxo = (roxo- c)dt- (1 + >.)dL + (1- J.t)dU,
(1.26)
dx1 = r1x1dt + ax1dw + dL- dU,
where ro and r1 are the bank interest rate and the mean rate of increase
of the value of the stocks. The controls are u = {c, L, U).
For a suitable utility function, one wishes to maximize the profit
W(x, u) = E; 100
e-Ptk(c(t))dt. (1.27)
In [37], the form k(c) = d' for >. E (0, 1) was used, and this allowed the
authors to get an (essentially) explicit analytic solution in terms of the
ratio x 0 jx 1 • For other types of utility functions or processes, a numerical
method might be needed.
234 8. Heavy Traffic and Singular Control
Comments. Only a small sample of the many types of heavy traffic and
singular control problems have been considered. [139] treats a singular con-
trol problem where there are infinitely many directions of control. The nu-
merical method for the problem given in [106] is a straightforward extension
of that given here. [67] and [153] treat a problem which arises in heavy traf-
fic modeling, where there are several classes of customers, a ''throughput"
constraint, and an ergodic cost criterion. A numerical method is developed
in [106], which gives a general approach to the Markov chain approximation
method for the ergodic cost problem with a singular control. Forms of the
reflecting diffusions which arise as heavy traffic limits of the "trunk line"
problems in telephone routing can also be dealt with [86, 110]. Ergodic
costs can be handled for all of the problems of this chapter, except for the
singular control problem.
the form of the reflection terms (I- P')Y(t) and -U(t). We next show
how to read the correct reflection directions from {1.15) or {1.16) {they are
the same for both systems). Refer to Figure 8.3a.
more closely (but still heuristically) to see how these reflection directions
are actually obtained.
Hence, the reflection direction is ( -P21, 1) for the boundary points in ques-
tion.
Next, let us repeat this procedure for the corner point (0, 0). Let X 2 (s) =
X 1 (s) = 0 on the interval [t,t + 8), and suppose that f1Ri < O,i = 1,2.
Then
( -~12 -;21 ) ( ~~~ ) =- ( ~~~ ) .
This implies that the set of reflection directions at the corner is the convex
hull of those at the adjoining sides.
Similarly for all points x such that x 2 = -hand x 1 E {0, B 1]. The procedure
is analogous for the left hand boundary. The assignment can be completed
by taking all other points on aat to the nearest point on Gh. This set of
rules gives us a natural and locally consistent approximation.
G
X4~\1
xs \
6 X3
The method of Case 1 is the most natural and is the one which will be
used.
Case 2. A second possibility for a locally consistent transition probability
at x 3 can be obtained by reversing the order of the "corrections" of case 1.
Refer to Figure 8.3c. Let us correct for the overflow first, by taking X3 to
x 6 • Then correct for the fact that the first component of the state is still
negative by moving along the reflection direction ( -1121, 1) to the point X7.
If P21 < 1, then X7 is not a grid point and we need to randomize between
8.2 The Heavy Traffic Problem 239
xs and x4 in the usual way so as to achieve the desired mean value. It can
be verified that this choice of the transition probability at x 3 is also locally
consistent. It will yield the same asymptotic results as Case 1.
We note that, in the limit, if the driving noise is nondegenerate, then
the contribution of the corner points to the cost is zero, so that the actual
form of the cost used at the corners is asymptotically irrelevant.
Thus,
m 2 n
E sup L:!::izf = O(h)E L:!:l.zf (2.1)
m~n i=O i=O
and {5.7.4) holds.
Owing to the special rectangular shape of G, and using the Case 1 "de-
composition" of the corners, for e~ = X E 8Gt we can write !::iz~ in the
form
(2.2)
where ay:,i (respectively, !:l.U~·i) are nonnegative and can increase only if
e~,i< 0 {respectively, > Bi)· Because in our case only the "lower" reflection
terms might be randomized, the right side of (2.1) is bounded above by
O(h)EIY:+ll·
Define the interpolations uh(t) = u~ on [r!, r!+l) and
and similarly define .zh(·). In Chapter 11, it will be shown that .zh(·) -t
zero process. Now, (5.7.5) takes the form
n=O n=O
(2.5)
Let Vh(x) denote the infima of Wh(x,u) over the admissible control
sequences u. For x E Gh, the dynamic programming equation is
system and cost function will be (1.22') and (1.23'), respectively, and the
sets C, Ch, and act are the same as in Section 8.2. Let qi > 0, ki 2: 0. Let
ph(x, y) and D.th(x) be a transition probability and interpolation interval
which are locally consistent with the reflected diffusion (1.22') on the state
space Sh = C h u act when the control term F( ·) is dropped. Without loss
of generality, let the Vi in (1.22') satisfy:
All the components of the vectors vi are no greater than unity in absolute
value, and at least one component equals unity in absolute value.
The control in (1.22') can be viewed as a sequence of small impulses act-
ing "instantaneously." With this in mind, we divide the possible behavior
of the approximating Markov chain into three classes:
(i) Suppose that ~~ = X E act. Then we have a "reflection step"' as in
Section 5.7 or Section 8.2 and Llth(x) = 0.
Otherwise, we are in the set Ch E C and there are two choices, only one
of which can be exercised at a time:
(ii) Do not exercise control and use ph(x, y) and D.th(x) which are locally
consistent with the uncontrolled and unreflected diffusion.
(iii) Exercise control and choose the control as described in the next
paragraph.
where the f::l.Ft:,i are nonnegative. The impulsive control action is deter-
mined by the choice of the direction among the {vi} and the magnitude
of the impulse in the chosen direction. Let vi be the chosen direction. For
convenience in programming, it is preferable if the states move only "lo-
cally." For this reason, the value of the increment f::l.Ft:,i is chosen to take
the state x only to neighboring points. Thus, it equals h.
The procedure is illustrated in Figure 8.4 in a canonical case. In the
figure, xi denotes the point of first intersection of the direction vectors vi
with the neighboring grid lines. We have hvi = Xi- x. Obviously, the Xi
depend on h. If more choices were allowed for the increments, then the
asymptotic results for the optimal value function would be the same.
242 8. Heavy Traffic and Singular Control
If the absolute value of each component of the vector Vi were either unity
or zero, then Xi would be a point in the regular h-grid Gh. Otherwise, it
is not. Analogously to what was done for the reflection problem in Section
5. 7 or in Section 8.2 the actual transition is chosen by a randomization
which keeps the mean value as hvi. Thus, for the example in the figure, at
a control step we will have E;;::Lle~ equaling either v1h or v 2 h, according
to the choice of the control direction. Write ph(x, yihvi) for the transition
probability if the control direction is vi. Let i = 2. Then, the corresponding
transition probability is (see the figure for the notation)
(3.1)
The transition probabilities under the choice of direction v1 are analogous.
In analogy to (1.23'), if vi is the chosen direction at time n, then an appro-
priate cost to assign to this control step is
h.
Qih = Qifl.Fn ,t •
Since the partial sums of the tiF~ form a martingale sequence, we can
write
n-1 N-1
E sup L tiFih = O(h)E L ltiPJI· (3.4)
n~N j=O j=O
Equation (3.4) implies that the "error" goes to zero if the sequence of the
costs due to the control are bounded.
Wh(x, ph) = Et f
n=O
e-.Bt! [k(e~)tit~ + ~ qitiPf:•i + ~ kitiU!·il·
t 1
(3.5)
Then, for x E Gh, the dynamic programming equation is
and any suitable approximation to the discount factor can be used. For
X E8Gt, the dynamic programming equation is (2.6b).
A Relationship Between {3.6) and the Dynamic Programming
Equation for {1.22') and {1.23'). The formal dynamic programming
equation for (1.22') and (1.23') is (1.25), to which the reader is referred.
Suppose that the ph(x, y) in (3.6) has been obtained via a finite differ-
ence approximation of the type discussed in Sections 5.1-5.3. Thus, we can
represent the sum
1- c2 = ph(x, x- e2hlhv2).
244 8. Heavy Traffic and Singular Control
Subtract Vh(x) from both sides of (3.6). Then the inner minimum divided
by h equals (here i f j)
This last expression is just the inner minimum in (1.25), once the super-
script h is dropped.
9
Weak Convergence and the
Characterization of Processes
This chapter begins the section of the book devoted to the convergence
proofs and related matters. The purpose of the chapter is to introduce
the mathematical machinery that is needed in the later chapters. Because
particular applications are intended, we do not, in general, give the most
elaborate versions of the theorems to be presented.
Our method for proving convergence of numerical schemes is based on
the theory of weak convergence of probability measures. The theory of weak
convergence of probability measures provides a powerful extension of the
notion of convergence in distribution for finite dimensional random vari-
ables. For the particular problems of this book, the probability measures
are the induced measures defined on the path spaces of controlled processes.
This notion of convergence is important for our purposes, since our approx-
imations to value functions always have representations as expectations of
functionals of the controlled processes.
The first section of the chapter is concerned with general results and the
standard methodology used in weak convergence proofs. Included in this
section is a statement of the Skorokhod representation, which allows the
replacement of weak convergence of probability measures by convergence
with probability one of associated random variables (in an appropriate
topology) for the purposes of certain calculations. The usual application of
weak convergence requires a compactness result on the sequence of prob-
ability measures (to force convergence of subsequences), together with a
method of identification of limits. In Section 9.2 we present sufficient con-
ditions for the required compactness. The conditions will turn out to be
simple to verify for the problems considered in later chapters. Section 9.3
246 9. Weak Convergence and the Characterization of Processes
emphasize the case when the limit is a Markov process and applications
are [52] and [93].
We can jump ahead a bit and indicate the reasons for our particular
interest in this notion of convergence. Consider for the moment the special
case of S = Dk [0, oo ). An example of a random variable that takes values
in the space S is the uncontrolled diffusion process x( ·) where
dx(t) = b(x(t))dt + u(x(t))dw(t),
with x(O) = x given. Of course, x( ·) also takes values in the smaller space
Ck [O,oo), and it is not a priori evident why we have chosen to use the
larger space Dk [0, oo). One important reason is that a basic compactness
result that will be needed in the approach described momentarily is easier
to prove for processes in Dk [0, oo).
Suppose that one is interested in calculating a quantity such as
w'(x) ~ E. [ { k(e'(s))ds]
could be readily computed. Recall that some candidate approximations
were developed in Chapter 5. A finite state Markov chain {e~, n < oo} and
an interpolation interval ~th(x) satisfying the local consistency conditions
were constructed, and the process eh(·) was then defined as a piecewise
constant interpolation of {e~,n < oo}. Thus, the processes {eh(·),h > 0}
take values in Dk [0, oo). Then, if the sense in which the processes eh (.)
approximate x(·) is actually eh(·) => x(·), we may conclude Wh(x) --+
W(x). This simple observation is the basis for the convergence proofs to
follow.
In order to make the procedure described above applicable in a broad
setting, we will need convenient methods of verifying whether or not any
given sequence of processes (equivalently sequence of measures) converges
weakly and also for identifying the limit process (equivalently limit mea-
sure).
d(s', s) < f for somes E A}. We define the Prohorov metric on P(S) by
rr(Pt. P 2) = inf {~: > 0 : P 1(A) ~ P 2(AE) +f for all closed A E B(S)}.
We will see below that convergence in the Prohorov metric is equivalent to
weak convergence when S is separable. This equivalence makes the result
which follows significant for our purposes. A standard reference for most of
the material of this section is Billingsley [13], where proofs of the theorems
can be found.
Theorem 1.1. If S is complete and separable, then 'P(S) is complete and
separable.
Let { P"Y, 'Y E r} c P(S), where r is an arbitrary index set. The collection
of probability measures {P"Y, 'Y E r} is called tight if for each f > 0 there
exists a compact set KE c S such that
{1.1)
If the measures P"Y are the induced measures defined by some random
variables X"Y, then we will also refer to the collection {X"Y,"f E r} as tight.
The condition {1.1) then reads (in the special case where all the random
variables are defined on the same space)
inf P{X"Y EKE} 2_> 1-~:.
"YEr
Theorem 1.2. (Prohorov's Theorem) If Sis complete and separable,
then a set {P"Y, 'Y E r} c 'P(S) has compact closure in the Prohorov metric
if and only if {P"Y, 'Y E r} is tight.
Assume that S is complete and separable and that a given sequence of
probability measures has compact closure with respect to the Prohorov
metric. It then follows from Theorem 1.1 that existence of a convergent
subsequence is guaranteed. In typical applications we will then show that
the limits of all convergent subsequences are the same. Arguing by contra-
diction, this will establish the convergence of the original sequence. Pro-
horov's theorem provides an effective method for verifying the compact
closure property. The usefulness of this result is in part due to the fact
that tightness can be formulated as a property of the random variables
associated to the measures Pn. Often these objects have representations
(e.g., SDE) which allow a convenient verification of the tightness property.
Remark 1.3. A simple corollary that will be useful for our purposes is
the following. Let 8 1 and 8 2 be complete and separable metric spaces, and
consider the spaces= sl X 82 with the usual product space topology. For
{P"Y,"f E r} c P(S), let {P"Y,l,'Y E r} c 'P(St) and {P"'f,2•'Y E r} c 'P(S2)
be defined by taking P"Y,i to be the marginal distribution of P"Y on Si,
for i = 1, 2. Then {P"Y, 'Y E r} is tight if and only if {P"'f,b 'Y E r} and
{P"Y,2, 'Y E r} are tight.
9.1 Weak Convergence 249
(i) Pn =} P
(ii) limsupn Pn(F) :'S P(F) for closed sets F
(iii) liminfn Pn(O) 2: P(O) for open sets 0
(iv) limn Pn(B) = P(B) for P- continuity sets B
(v) 7r(Pn, P) --t 0.
However, Theorem 1.5 implies the convergence still holds, since the limit
process x(·) has continuous sample paths (w.p.1) and 4>(·) --t g(tf>(T)) is
continuous at all 4>( ·) which are continuous.
Remark 1.6. Note that there is an obvious extension of the definition of
weak convergence of probability measures, in which the requirement that
the measures be probability measures is dropped. ForT< oo, let MT(S)
250 9. Weak Convergence and the Characterization of Processes
P { Xn E B} = Pn (B) ,
[85, Theorem 2.7b]. Recall that the random timeT is an Ft-stopping time
if {r ~ t} EFt for all t E [O,oo).
Theorem 2.1. Consider an arbitmry collection of processes {x"~,-y E r}
defined on the probability space (0, .1", P) and taking values in Dk [0, oo).
Assume that for each mtional t E [0, oo) and 8 > 0 there exists compact
Kt,6 c JRk such that SUP'YerP{x"~(t) f/. Kt,6} ~ 8. Define n
to be the
u-algebm genemted by {x"~(s), s ~ t}. Let 7:f be the set of fl -stopping
times which are less than or equal to T w. p.1, and assume for each T E
[O,oo) that
If there is a filtration .Ft defined on (0, .F, P) such that MJ(t) is an .Ft-mar-
tingale for all f E Co (JR) and 8 E C(f), then N (·) in an .Ft- Poisson
random measure with intensity measure .Adt x II( dp) on [0, oo) x r.
Lastly, we note the important fact that a Wiener process and a Poisson
random measure that are defined on the same probability space and with
respect to the same filtration are mutually independent [75, Theorem 6.3].
9.4 An Example 253
9.4 An Example
It is instructive to see how the results outlined so far in this chapter yield
convergence of numerical schemes in a simple example. Because the ex-
ample is purely motivational, and since the full control problem will be
treated in Chapter 10, we consider a numerical problem for a simple one
dimensional diffusion. Although simple, the example will illustrate the way
that weak convergence methods can be used to justify numerical approxi-
mations. The problem considered can in some cases be more easily treated
by classical methods from numerical analysis. However, it will expose some
important points and will illustrate the typical use of the material pre-
sented in the last three sections. This section will also serve as a reference
point for our more involved use of the basic methods later in the book.
We consider a problem with a diffusion process that is the solution to
the one dimensional SDE
dx = b(x)dt + u(x)dw, x(O) = x. {4.1)
The problem of interest will be the approximation of the function
Proof. We must show the assumptions of Theorem 2.1 hold for the process
(h(·). Recall that E! denotes expectation conditioned on F((f, i ::::; n)
and that fl.(! = e!+l - e!. By construction, the chain satisfies the local
consistency conditions, and by a calculation given in Section 5.1
E~Ll(~ = b((~)Llth((~),
(4.2)
E~ {fl.(~- E~fl.(~) 2 = [a 2 ((~) + O(h)] Llth((~).
Let Nh(t) = max{n: t~::::; t}. Using (4.2), we compute
Nh(t)-l 2
2
Nh(t)-1
L b(~f)~th(~f)
i=O
Nh(t)-1
+ 2E; L [a(~f) + O(h)] ~th({f)
i=O
where K is a bound for lb(x)l V la(x)l for all x E [0, B]. Together with
Chebyshev's inequality, this yields the first condition assumed in Theorem
2.1.
We must also prove (2.1). In the present context, this condition may be
rewritten as
where T,P be the set of .rt -stopping times which are less than or equal to
T w.p.1 and .rt is the a-algebra generated by {{h(s),s ~ t} = {{f,i ~
Nh(t)}. The limit (4.3) can be proved by calculations similar to those of
the previous paragraph. Using the strong Markov property of the process
{~f,i < oo}, we have
for the limit process. The local consistency condition (4.2) gives
Nh(t}-1
From the calculations which were used to prove the tightness of the se-
quence {~h(·),h > 0} we obtain tightness of the sequence {wh(·),h > 0}.
Let {(~h(·),wh(·)),h > 0} be a convergent subsequence, and denote the
limit by (x(·),w(·)). We first prove that w(·) is indeed a Wiener process.
Let us fix t ~ 0, r > 0, q < oo, ti E [0, t] with ti+1 > ti for i E {0, ... , q},
and any bounded continuous function H: JR 2 xq---+ JR. Let f E C~(JR) and
let Cw be the differential operator of the Wiener process, i.e., Cwf(x) =
(1/2)/xx(x). From the definition of wh(-),
rt+r
f(wh(t + r))- f(wh(t))- lt Cwf(wh(s))ds
Nh(t+r}-1
L [f(wh(t~+1))- f(wh(t~))]
i=Nh(t)
Nh(t+r)-1
L ~fxx(wh(t~))~th(~f) + O(h2 )
i=Nh(t)
Nh(t+r}-1
L fx(wh(t~)) [~~f- Ef~~f] /11(f.f)
i=Nh(t)
1 Nh(t+r}-1
+2 L fxx(wh(tf)) [~~f- Ef~~f] 2 /11 2 (~f)
i=Nh(t)
Nh(t+r}-1
-~ L fxx(wh(tf))~th(~f) + fh + O(h2 ),
i=Nh(t)
where Eh kh I ---+ 0 as h ---+ 0. By using this expression together with the
consistency condition (4.2) we have
EhH(f.h(ti),wh(ti), 1:::; i:::; q)x
Then ~N ·) -+ x.s (·) with probability one in D [0, oo). From the definition
258 9. Weak Convergence and the Characterization of Processes
1 O
t
b(eh(s))ds +
Nh(t)-1
L u(ef) [wh(tf+ 1)- wh(tf)] + O(h
i=O
2 ).
Using this representation, the continuity and boundedness of b(·) and u(·),
and the tightness of {eh(·), h > 0}, we can write
eg(t)- X = 10
t
b(eg(s))ds +
[t/6]
L u(eg(j8)) [wh(j8 + 8)- wh(j8)]
j=O
+ O(h2 ) + E~,t•
where [s] denotes the integer part of s and where EjE~ tl -t 0 as 8 -t 0,
uniformly in h > 0 and tin any bounded interval. Taking' the limit ash -t 0
yields
x.s(t)- x = 1 0
t
b(x0 (s))ds +
[t/6]
L u(x.s(j8)) [w(j8 + 8)- w(j8)] +
j=O
Eo,t,
as 8 -t 0. Therefore, x( ·) solves
A Topology for the Set [O,oo]. Because the cost Wh(x) also involves the
potentially unbounded stopping times Th, we must consider weak conver-
gence for sequences of random variables with values in [0, oo]. We consider
9.4 An Example 259
[0, oo] as the one point compactification of [0, oo ), i.e., the point {oo} is ap-
pended to the set [0, oo) as the limit point of any increasing and unbounded
sequence. Since the set [0, oo] is compact, any sequence of random variables
taking values in this set, and in particular the sequence of stopping times
{Th,h > 0}, is tight.
Theorem 4.3. Under the assumptions of this section, we have Wh(x) -t
W(x).
Proof. Consider the pair (eh(·), Th)· Let {(eh(-), Th), h > 0} be a conver-
gent subsequence, with limit denoted by (x(·),f). Using the Skorokhod
representation, we can assume that the convergence is
(4.7)
with probability one. Before we can apply Theorem 1.5 to show Wh(x) -t
W(x), there are several issues to resolve. Recall that T = inf{t : x(t) E
{O,B}}. From the definitions of Wh(x) and W(x) we see that to prove
Wh(x) -t W(x) we will need
f=T (4.8)
Since k( ·) and g( ·) are continuous and since the sample paths of x( ·) are
continuous w.p.1, sufficient conditions for the continuity w.p.1 of (4.9) are
The boundedness of k(·) and g(·) imply that a sufficient condition for the
uniform integrability of (4.11) is uniform integrability of
Except for the proofs of (4.10) and (4.12), the proof of Wh(x) ~ W(x) is
complete. •
w(s) = 1'T
-r+t(s)
u(x(r))dw(r)
9.4 An Example 261
Then for all h E [0, ho] and x E {0, h, 2h, ... , B}, Px{Th ~ T} ~ (1- 8).
From the Markov property of ~h(·), Px{Th ~ iT} ~ (1- 8)i for i < oo.
Therefore,
00
which implies the uniform integrability of (4.12). The condition (4.13) can
be established using only weak convergence arguments and properties of
the limit process x(·). First, we note that
for all T > 0. This can be proved in many ways. For example, the inequality
can be shown by using the same time change as used in the proof of (4.10)
together with the explicit form of the Gaussian distribution. Next, assume
that (4.13) is not true. Then there exist T > 0, hn -t 0, and Xn -t x E [0, B]
such that limn Px,. { Th,. < T} = 0. Consider processes ~h.. ( ·) which start at
Xn rather than at a fixed point x at time t = 0. Because {Xn, n < oo} is
compact, the same calculations as those used in the case of a fixed starting
position imply {ehn (. ), n < 00} is tight, and also that eh,. (.) tends weakly to
a solution x( ·) of (4.1). As shown above, the exit time f( ¢( ·)) is a continuous
function of the sample paths of x( ·) with probability one, so, by Theorem
1.5, Th,. =? T. Using part (iii) of Theorem 1.4,
Remarks. In the sequel we will see many variations on the basic ideas of
this example. In all cases the required adaptations are suggested directly
by the particular properties of the problem under consideration. Here we
comment on several aspects of the problem considered in this section.
Part of the power of the approach advocated in this book is the ease
with which it handles less stringent assumptions. Aside from weak sense
uniqueness assumptions on certain limit processes, which would seem to
be expected in any case as a consequence of proper modelling, there is
considerable flexibility in weakening the other assumptions. For example,
262 9. Weak Convergence and the Characterization of Processes
suppose the condition a 2 (x) :=:: c > 0 on [0, B] is dropped. In order that
the process not have any points at which it is "stuck," let us assume
infxE[O,B] [a 2 (x) + lb(x)l) > 0. Then with some minor modifications and
appropriate additional assumptions on b(·), the method still can be used.
For example, at points where a(x) = 0 the definition (4.4) of the approxi-
mation wh(-) to the limiting Wiener process must be modified. This reflects
the fact that we cannot "reconstruct" the Wiener process from the process
x(·) at those points. However, minor modifications of the definition of wh(-)
and the associated filtration solve the problem, and the analogous conclu-
sion regarding convergence of ~h(·) follows as before. The details may be
found in Chapter 10.
If we weaken the nondegeneracy assumption on a(·), then we must also
reconsider the proofs of (4.10) and (4.12). These conditions are not simply
technical nuisances whose validity is not directly related to the convergence
of the schemes. For example, suppose a(x) = 0 in neighborhoods of both 0
and B. Suppose also that b(B) ::; 0. Then clearly the process x(·) does not
exit through the point B. An analogous statement holds if b(O) :::: 0. In such
a case, (4.12) clearly fails. However, it will also be true that for reasonable
choices of k(·) that W(x) = oo. Thus, we can have seemingly reasonable
approximations (i.e., local consistency holds) and only the failure of (4.12)
indicates a difficulty with problem formulation. Precise conditions for such
degenerate problems and verification of the analogues of (4.10) and (4.12)
will be described in detail in Chapter 10.
for all BE B(U x [0, oo)) [i.e., m(dadt) = mt(da)dt] and such that for each
t, mt(·) is a measure on B(U) satisfying mt(U) = 1. For example, we can
define mt ( ·) in any convenient way for t = 0 and as the left hand derivative
fort> 0:
_ 1. m(A x [t - 8, t])
mt (A) -Im .~:
8~0 u
for A E B(U). Let R(U x [0, oo)) denote the set of all relaxed controls on
U x [O,oo).
I ¢>(a,s)mn(dads) ~I ¢>(a,s)m(dads)
for any continuous function¢(·,·) on U x [O,oo) having compact support.
Since P(U x [0, j]) is complete, separable, and compact for all j < oo, these
properties are inherited by the space 'R(U x [0, oo)) under this metric. It
follows that any sequence of relaxed controls has a convergent subsequence.
This key property will be used often in the sequel. We will write mn =} m
for convergence in this ''weak-compact" topology.
W(x,u) = 1 00
e-f3sk(x(s),u(s))ds
in the sense that the cost function and dynamics may be rewritten as
W(x,u) = W(x,m) = 1L 00
e-f3sk(x(s),a)m(dads), (5.2)
Let mn(-) denote a sequence of relaxed controls such that the associated
costs converge to the infimum over all relaxed controls. Compactness of
the space of 'R(U x [0, oo)) and boundedness of b(·) imply relative com-
pactness of the sequence (xn(-),mn(·)), where xn(-) is the solution to the
controlled ODE under mn(·). Suppose we retain n as the index of a conver-
gent subsequence and suppose also that the limit is denoted by (x(·),m(·)).
Owing to the weak convergence mn(-) =} m(·), x(·) solves (5.3) with the
cost (5.2). Thus, the infimum over the relaxed controls is always attained.
By the approximation theorem of Section 4.6 any relaxed control may be
9.5 Relaxed Controls 265
This chapter is the core of the mathematical part of the book. It deals
with the approximation and convergence theorems for the basic problem
classes: discounted problems with absorbing boundaries; diffusion and jump
diffusion models; optimal stopping problems, and problems where we stop
on hitting a target set and where there is no discounting. The convergence
results for the case of reflecting boundaries and the singular and ergodic
control problems will appear in the next chapter.
The chapter starts off with some approximation and limit results for
a sequence of controlled jump diffusion problems. These results and the
methods which are used provide a base for the later proofs of the con-
vergence of the numerical approximations. The first result, Theorem 1.1,
shows that the limit of a sequence of controlled jump diffusions is also a
controlled jump diffusion. The method of proof is more direct than the
usual martingale problem approach, because we have access to the driving
Wiener processes and Poisson measures. The approach is a combination of
a classical weak convergence method together with a direct construction.
The main points of the theorem are the "admissibility" of the limit controls
and the stopping or exit times. The theorem provides the basis for the ap-
proximation of relaxed controls by simpler controls. These simpler controls
will be applied to the approximating chains to get the convergence results
for the numerical methods. In particular, Theorem 1.2 shows that we can
approximate a relaxed control by a piecewise constant control which takes
values in a finite set.
Theorem 2.1 concerns convergence of sequences of controlled problems
to a limit problem, when control stops at the first moment that a given
268 10. Convergence Proofs
+ 1t l
(1.1)
q(x(s-),p)N(dsdp).
A1.4. Let u(·) be an admissible ordinary control with respect to (w(·), N(·)),
and suppose that u( ·) is piecewise constant and takes only a finite number
of values. Then, for each initial condition, there exists a solution to (1.1),
where m( ·) is the relaxed control representation of u( ·), and this solution is
unique in the weak sense.
The first member is just a Poisson process with rate .A, and it identifies the
jump times. The second member can be used to get the jump values.
triple. However, we want to work with the largest class of controls possible,
provided that they are admissible. It is not a priori obvious that the optimal
control will in all cases be representable as a function of only (w(·), N(·)).
Thus, we might at least have to augment the probability space, in a way
that depends on the control. It should be understood that the underlying
Wiener process and Poisson measure are fictions to a large extent. They are
very useful to represent and study the processes, but when calculating cost
functions and their limits, only the distributions of the processes are im-
portant, and not the actual probability spaces or the representation of the
solution to the SDE. Under the Lipschitz condition, given the probability
law of (m(·),w(·),N(·)), and the initial condition x, the probability law of
(x(·),m(·),w(·),N(·)) is uniquely determined, and that probability law is
all that is important is computing the cost functions. Let (mn(-), w(·), N(·))
be a sequence of admissible triples, all defined on the same probability space
with the same standard Wiener process and Poisson measure. The sequence
(or some subsequence) will not generally converge to a limit with probabil-
ity one (or even in probability) on the original probability space, and weak
convergence methods might have to be used to get appropriate limits. But
then the "limit processes" will not, in general, be definable on the original
probability space either. Because we do not want to worry about the actual
probability space, we often let it vary with the control.
If the Lipschitz condition does not hold, but the Girsanov transformation
method is used to get the controlled process from an uncontrolled process
via a transformation of the measure on the probability space, then the
Wiener process is not fixed a priori, and its construction depends on the
control. The considerations raised in the last paragraph also hold here.
These comments provide a partial explanation of the indexing of the Wiener
process and Poisson measure by n.
Theorem 1.1. Assume (Al.l) and (A1.2). Let xn(o) => x 0 and let vn be
an F["-stopping time. Then any sequence {xn(-),mn(·),wn(·),Nn(-),vn}
is tight. Let (x(· ), m(-), w(·), N(·), v) denote the limit of a weakly convergent
subsequence. Define
Then w(·) and N(·) are a standard Ft- Wiener process and Ft-Poisson
measure, respectively, v is an Ft-stopping time, m(·) is admissible with
respect to (w(·),N(·)), x(O) = xo, and x(·) satisfies (1.1).
Proof. Tightness. The criterion of Theorem 9.2.1 will be used. LetT< oo,
and let Dn be an arbitrary F["-stopping time satisfying Dn ~ T. Then, by
the properties of the stochastic integral and the boundedness of q( ·) and
the jump rate A,
272 10. Convergence Proofs
where the order 0(8) is uniform in Vn· Thus, by Theorem 9.2.1, the sequence
{xn(·)} is tight. The sequences of controls {mn(·)} and stopping times {vn}
are tight because their range spaces ['R.(U x [0, oo)) and [0, oo], respectively]
are compact. Clearly {wn(-),Nn(·)} is tight and any weak limit has the
same law as each of the (wn (-), Nn (·)) pairs has.
Chamcterization of the limit processes. Now that we have tightness, we
can extract a weakly convergent subsequence and characterize its limit.
For notational convenience, let the original sequence converge weakly, and
denote the limit by (x(·},m(·},w(·},N(·),v). Because the processes wn(-)
have continuous paths with probability one, so will w(·). It follows from
the weak convergence that m(t,U} = t for all t. We want to show that
the limit x( ·) is a solution to a stochastic differential equation with driving
processes (m( ·), w( ·), N (·)). This will be done by a combination of a fairly
direct method and a use of the martingale method.
Let 8 > 0 and let k be a positive integer. For any process z(·) with paths
in Dk [0, oo), define the piecewise constant process Z6 (·) by
uniformly on any bounded time interval with probability one. The sequence
{mn(·)} converges in the "compact-weak" topology. In particular, for any
continuous and bounded function rjJ( ·) with compact support,
10.1 Limit Theorems 273
Let p, t, u, ti, i:::; p, be given such that ti :::; t :::; t + u, i:::; p and P{v =
ti} = 0. For q = 1, 2, ... , let {rj,j :::; q} be a sequence of nondecreasing
partitions of r such that II(orj) = 0 for all j and all q, where orj is the
boundary of the set rj. As q -+ oo, let the diameters of the sets rj go to
zero. Let the Wiener processes be JR!' -valued, and let/(-) E C3(1Rr').
1 The function (rp, m)t of m(·) is introduced, since any continuous function of
m(·) can be arbitrarily well approximated by continuous functions of the type
for appropriate {ti} and continuous h(·) and ,P;(·) with compact support.
274 10. Convergence Proofs
is an Ft-martingale for all f(·) of the chosen class. Thus, w(·) is a standard
Ft- Wiener process.
We now turn our attention to showing that N (·) is an Ft- Poisson mea-
sure. Let 8(-) be a continuous function on r, and define the process
By an argument which is similar to that used for the Wiener process above,
iff(-) is a continuous function with compact support, then
EH(x(ti),w(ti), (¢j, m)t.,N(ti,rj),j ~ q, i ~ p, vl{v:St})
with probability one for each t. Let Db denote the set of points where boO
is discontinuous. This will hold if J~ bo(¢(s))ds is a continuous function
from Dr[o, T] to IRr for each T, with probability one with respect to the
measure induced by x( ·). A sufficient condition is that, for each t
lim
€
t P {x(s)
lo
E N€(Db)} ds = 0.
be defined on the same probability space, but this can always be done via
the Skorokhod representation. The value of the cost function depends on
the joint distribution of (x(·},m(·)). In order to simplify the notation, we
write the cost function only as a function of m( ·) and the initial condition
x. The brevity of the notation should not cause any confusion.
Theorem 1.2. Assume (A1.1} - (A1.4} and let x(O} = x. For a given
admissible triple (m(·}, w(·),N(·)}, let the solution x(·) to (1.1} exist and
be unique in the weak sense. For any finite T and {3 > 0, define the cost
functions
W(x,m) = E;a 1L 00
e-/Jsk(x(s},a:)m(da:ds). (1.10}
Given e > 0, there is a finite set {a:l, ... , a:~.} = U€ c U, and a 8 > 0
with the following properties. There is a probability space on which are
defined processes
(x€( ·}, u€( ·}, w€(· }, N€(·) }, (1.11}
where wE(-) and NE(-} are our standard Wiener process and Poisson mea-
sure, respectively, and uE(·) is an admissible UE-valued ordinary control
which is constant on the intervals [i8, i8 + 6). Furthermore, the processes
(1.11) satisfy (1.1) and
for the numerical algorithm. This problem was also discussed in Chapter
9. In this section, we will work with the cost function
where /3 ~ 0 and r is the first escape time from the set C0 , the interior of
the set C satisfying (A2.1) below. The discussion concerning continuity of
the escape time below is applicable to general cost functions and not just
to the discounted cost function.
Using the notation of Theorem 1.1, let {xn(·), mn(-), wn(·), Nn(·), Tn} be
a minimizing sequence, that is,
(2.2)
and rn = inf{t: xn(t) rf. C0 }. By the results of Theorem 1.1 we know that
{xn (-), m n (-), wn (-), Nn (-), Tn} has a weakly convergent subsequence. For
notational simplicity, suppose that the original sequence itself converges
weakly and that (x(·),m(·), w(·),N(·),f) denotes the limit. Define the fil-
tration
:Ft = :F(x(s), m(s), w(s), N(s), s ~ t, fl{T~t})·
Then by Theorem 1.1 w( ·) and N (·) are a standard :Ft- Wiener process and
Poisson measure, respectively. Also, m(·) is admissible, (1.1) holds and f is
an :Ft -stopping time. If either f3 > 0 or the {rn} are uniformly integrable,
then under the continuity and boundedness of k(·) and g(·), it is always
the case that
Iff = r, the time of first exit of the limit x( ·) from C 0 , then we would have
and the existence of an optimal control for the cost function (2.1) would
be proved.
In the figure, the sequence of functions <Pn (·) converges to the limit func-
tion <Po (·), but the sequence of first contact times of <Pn (·) converges to a
time which is not the moment of first contact of <Po (·) with the boundary
line aG. From the illustration, we can see that the problem .in this case is
that the limit function is tangent to aG at the time of first contact.
For our control problem, if the values W(x, mn) are to converge to the
value W(x, m), then we need to assure (at least with probability one) that
the paths of the limit x( ·) are not "tangent" to aG at the moment of first
exit from G0 • Let us now define our requirement more precisely. For <P( ·)
in Dr[o, oo) (with the Skorokhod topology used), define the function f(</J)
with values in the compactified infinite interval JR.+= (O,oo] by: f(</J) = oo,
if </J( t) E G 0 for all t < oo, and otherwise
f(</J) = inf{t: </J(t) ¢CO}. (2.5)
In the example of Figure 10.1, f(·) is not continuous at the path <Po(·).
If the path </JoO which is drawn in the figure were actually a sample path
of a Wiener process, then the probability is zero that it would be ''tangent"
to aG at the point of first contact. This is a consequence of the law of the
iterated logarithm for the Wiener process or, more intuitively, because of
the "local wild nature" of the Wiener process. It would cross the boundary
infinitely often in any small interval about its first point of contact with
a smooth boundary. The situation would be similar if the Wiener process
were replaced by the solution to a stochastic differential equation with a
uniformly positive definite covariance matrix a(x). This was illustrated in
Section 9.4, where it was shown that this "tangency" could not happen due
to the law of the iterated logarithm. If such a process were the limit of the
{xn(·)} introduced above and the boundary aG were "smooth" (see the
remarks below), then we would have Tn ---t r, where T is the first hitting
time of the boundary. If, in addition, {3 > 0 or {Tn} is uniformly integrable,
then
W(x, mn) ---t W(x, m). (2.6)
10.2 Existence of an Optimal Control 279
If the original sequence were minimizing, then W(x, m) = V(x). The same
"boundary" considerations arise when proving the convergence of the nu-
merical approximations Vh(x) to V(x). We will next give conditions which
will guarantee the convergence (2.6).
A2.2. The function f(·) is continuous (as a map from Dr[o,oo) to the
compactified interval [0, oo]) with probability one relative to the measure
induced by any solution to (1.1) for the initial condition x of interest.
For the purposes of the convergence theorems for the numerical approx-
imations starting in Section 10.3, (A2.2) can be weakened as follows:
Remark on (A2.2). (A2.2) and (A2.2') are stated as they are because
little is usually known about the e-optimal processes. Such conditions are
satisfied in many applications. The main purpose is the avoidance of the
"tangency" problem discussed above. The tangency problem would appear
to be a difficulty with all numerical methods, since they all depend on
some sort of approximation, and implicitly or explicitly one seems to need
some "robustness" of the boundary conditions in order to get the desired
convergence. In particular, the convergence theorems for the classical fi-
nite difference methods for elliptic and parabolic equations generally use a
nondegeneracy condition on a(x) in order to (implicitly) guarantee (A2.2).
The nature of the dynamical equation (1.1) often implies the continuity
off(·) with probability one, owing essentially to the local ''wildness" of the
Wiener process. Let us consider a classical case. Let a( x) be uniformly pos-
itive definite in G, and let 8G satisfy the following "open cone condition" :
There are fo > 0, f > 0, and open (exterior) cones C(y) of radius eo at unit
distance from the origin such that for each y E 8G, we have
{x: X- y E C(y), iY- xi< t:} n ao = 0.
Then by [49, Theorem 13.8], (A2.2) holds.
The checking of (A2.2) for the degenerate case is more complicated, and
one usually needs to take the particular structure of the individual case into
280 10. Convergence Proofs
process has not yet stopped) goes to unity as the state value x(t) at that
time approaches 8G. This will now be formalized.
{2.7)
The randomized stopping rule can be applied to the approximating Mar-
kov chain {e~, n < oo} in the same way: Stop the chain at step n with
probability 1- exp(-[X(e~)~t~]). fore~ E 00, and with probability one if
e~ f/. G0 . The stopping cost is g(e~). The computational problem is altered
only slightly. For example, if ~th(x, a) does not depend on a, then the
dynamic programming equation (5.8.3) becomes
Vh(x) = {1- e-X(x)6.th(x))g(x)
(2.8)
+ e-X(x)6.th(x) x right side of (5.8.3),
for x E Gh, and with the boundary condition Vh(x) = g(x) for x f/. Gh.
The Convergence Theorem. Theorem 1.1 and the above discussion yield
the following result:
Theorem 2.1. Assume {A1.1)- {A1.3), {A2.1), and that the cost {2.1) is
used. Let either ,B > 0, or {rn} be uniformly integrable. Let
{xn(-), mn(-), wn(-), Nn(·), rn}
be a sequence of solutions to {1.1) and associated stopping times which
converges weakly to (x(·), m(·), w(·), N(·), f). Let {A2.2) hold for the limit.
Then f = r with probability one and (2.6) holds. If the sequence is mini-
mizing, then {2.4) holds. Let k(x) ~ 0 and g(x) = 0, and let the sequence
be minimizing. Then
liminfW{x,mn) ~ W(x,m)
n
and {2.4) holds without {A2.2), where W(x, m) is the cost for the limit
process whether or not the solution to {1.1) is unique.
282 10. Convergence Proofs
Assume the randomized stopping in lieu of(A2.2), and replace the {Tn}
above with the (no larger) randomized stopping times. Then the assertions
in the first paragraph remain true.
Theorem 3.1. Assume (ALl) - (A1.4), (A2.1) and (A2.2') and use the
cost function (2.1) with f3 > 0. Fix Eo > 0, and let (x( ·), m( ·), w( ·), N( ·)) be
an Eo-optimal solution whose existence is asserted in (A2.2'). LetT denote
the first escape time from C 0 • Then, for each E > 0, there is a 8 > 0 and a
probability space on which are defined a pair (w£(·), N£(·)), a control u'(·)
of the type introduced in Theorem 1. 2, and a solution x£ ( ·) such that
There is(}> 0 and a partition {rJ,j ~ q} ofr such that the approximating
u£ (·) can be chosen so that its probability law at any time n8, conditioned on
{w' (s), N£ (s), s ~ n8, u' (i8), i < n}, depends only on the initial condition
x = x(O) and on samples
and is continuous in the x, w' (p(}) arguments for each value of the other ar-
guments. If the set of stopping times over any set of controls with uniformly
bounded costs is uniformly integrable, then the above conclusions continue
to hold for {3 = 0.
10.3 Approximating the Optimal Control 283
Comment on the Proof. Only the first paragraph will be proved. In-
equality (3.1) and the statements above it are essentially Theorem 1.2. The
only modifications concern the presence of a stopping boundary. The asser-
tions (3.1) and above yield an approximating control u'(·) which takes only
finitely many values and is constant on the time intervals [n8, n8 + 8). To
get the form of the control whose existence is asserted below (3.1), we start
with the control u'(·) constructed above and modify it in several stages.
The desired control will be defined via its conditional probability law, given
the past values of the control and the driving Wiener process and Poisson
measure. First it is shown, via use of the martingale convergence theorem,
that the conditional probability law can be arbitrarily well approximated
by a conditional probability law that depends on only a finite number of
samples of the driving processes. In order to get the asserted continuity in
the samples of the Wiener process, a mollifier is applied to the conditional
probability law obtained in the step above.
Proof. Part 1. Let E > 0. By Theorems 1.1 and 1.2, there are 8 > 0, a finite
set U, C U and a probability space on which are defined a solution to (1.1),
namely, (x'(·),u'(·),w'(·),N'(·)), where u'(·) is U,-valued and constant
on the intervals [n8, n8 + 8). Also, (x'(·), m'(·), w'(·), N'(·)) approximates
(x(·),m(·), w(·),N(·)) in the sense that as E-+ 0,
(x'( ·), m'( ·), w'(· ), N'( ·)) =? (x( ·), m( ·), w(-), N(·)). (3.3)
Let r' = inf{t : x' (t) (/. G 0 }. There is a stopping time f such that
lim sup
f
v: :=:; E1. (3.5)
Since a > 0 is arbitrary, this, (3.4), and the discounting imply that
where c5, 1 -t 0 as fl -t 0.
Part 2. In order to get the representation asserted below (3.1), we will
start with au' (·) of the type just described in Part 1 for small f, and modify
it slightly to get the desired continuity and "finite dependence" properties.
This will be the u•0P( ·) defined below. Let 0 < (} < c5. Let the {fJ, j ~ q} be
partitions of r of the type used in the proof of Theorem 1.1, and let q -t oo
as (} -t 0. For a E U., define the function Fno as the regular conditional
probability
= Fno (a; x, u'0 (i8), i < n, w'6 (p0), N' 6 (p0, fJ),j ~ q,pO < n8).
(3.8)
By the construction of the control law, as (} -t 0
The solution to (1.1) exists (on some probability space) and is (weak sense)
unique when the control is piecewise constant and takes only a finite number
of values. Using this, we get
(x' 6 ( •), m'6 ( •), w'6 ( •), N' 6 ( ·)) =* (x'( ·), m'( ·), w'( ·), N'( ·))
where N(p) is a normalizing constant such that the integral of the mollifier
is unity. The Fnsp are nonnegative, their values sum (over a: E U,) to unity,
and they are continuous in the w-variables for each value of the other
variables. Also, they converge to the unmollified function with probability
one as p --t 0. The last assertion and the continuity are consequences of
the fact that the probability distribution of a normally distributed random
variable is absolutely continuous with respect to Lebesgue measure.
Let u'8P( ·) be the piecewise constant admissible control which is deter-
mined by the conditional probability distribution Fnsp(·) : In particular,
there is a probability space on which we can define (w' 8 P(·),N• 8P(·)) and
the control law u'8P(·) by the conditional probability
P {u'8P(nc5) = o:!x, u'8 P(ic5), i < n, w'8 P(s), N' 9 P(s), s ~ nc5}
= Fnsp (o:;x,u' 8P(ic5),i < n,w'8 P(p(J),N'8 P(p(J,fJ),j ~ q, p(J < nc5).
Then, by the construction of the probability law of the controls,
(x'oP( ·), m'oP( ·), w'oP(. ), N'oP( ·)) :::} (x'o(. ), m'o(. ), w'o(·), N'o( ·))
as p --t 0. We therefore have
lim sup v,sp ~ ft
<,9,p
and
lim sup IW(x, m'8 P)- W(x, m)l ~ c5w (3.10)
E 18,p
Putting the above arguments together, and noting that € 1 can be cho-
sen arbitrarily small, yields that for each € > 0 there are c5 > 0, () >
O,q,w'(·),N'(·), and an admissible control law which is piecewise constant
(on the intervals [nc5, nc5 + c5)) with values in a finite set U, c U, and deter-
mined by the conditional probability law
P {u'(nc5) = o:!x, u'(ic5), i < n, w'(s), N'(s), s ~ nc5}
where the Fn(-) are continuous with probability one in the w-variables,
for each value of the other variables, and for which (3.1) holds. Owing to
the weak sense uniqueness (A2.2'), without loss of generality we can apply
a mollifier to the x-dependence and suppose that there is continuity in the
(x, w )-variables.
Under the uniform integrability condition, we can restrict our attention
to a finite time interval and the same proof works. •
ularities," this cannot be done as often as one would like. The methods
of proof to be employed are purely probabilistic. They rely on the fact
that the sequence of "approximating" processes which are defined by the
optimal solutions to the Markov chain control problem can be shown to
converge to an optimal process of the original form (1.1). The most useful
methods of proof are those of the theory of weak convergence of probability
measures, which provide the tools needed to characterize the limits of the
sequence of approximating processes. An example was given in Chapter 9.
In this section, we will prove the convergence of the continuous parameter
interpolations 1/Jh(-) to controlled processes of the form of (1.1).
(4.2)
and Vh(x) denotes the minimum value. The dynamic programming equa-
tion is (4.3.7). Recall that we can approximate the discount factor by any
quantity dh(x, a) such that dh(x, a)/e-f3t:.th(x,o.)-+ 1, as discussed in Chap-
ter 4. In the next section it will be shown that Vh(x)-+ V(x).
The first term is just a finite (w.p.1) sum, and the second is a stochas-
tic integral. It can be easily verified that the defined process wh(-) is a
martingale. The processes defined by the two terms in (4.5) are orthogo-
nal martingales. The quadratic variation of wh(-) is just the sum of the
quadratic variations of the two components and is
1t Dt(s)P~(s)[Ph(s)D~(s)P~(s)]Ph(s)Dt(s)ds + Eh(t)
0
+ 1
t
(I- Dh(s)Dt(s))(I- Dh(s)Dt(s))ds =It+ Eh(t),
(4.6)
10.4 The Approximating Markov Chain 289
where I is the identity matrix and Eh (t) is an error which goes to zero as
h-+ 0, and is due to the error ah(x)- a(x) [see (4.1.3)].
The second term in (4.5) was constructed to compensate for the degen-
eracies in the first term; in particular, to assure that the quadratic variation
of wh (·) is close to that of a Wiener process. The first term on the right side
of (4.5) is linear between the jump times, and the jumps are bounded above
by 8o(h)/8 1(h) which goes to zero uniformly in all other variables ash-+ 0.
The truncation level 81 (h) was chosen to assure that the jumps in wh(-)
would go to zero as h -+ 0, so that any weak limit would have continuous
paths with probability one and, in fact, be a standard Wiener process. The
fact that any weak limit is a Wiener process is implied by the fact that a
continuous local martingale whose quadratic variation function is It must
be a Wiener process (Section 9.3). Using the "differential" notation, note
that [ignoring the negligible error Eh(t)]
o-(1/Jh(t))dwh(t) = Ph(t)Dh(t)dwh(t)
= Ph(t)Dh(t)[Dt(t)Ph(t)dMh(t) +(I- Dh(t)Dt(t))dw(t)]
= dMh(t) + [Ph(t)Dh(t)Dt(t)Ph(t)- I]dMh(t) + 0(81(h))dw(t).
Thus, we can write
where, for each t, Esup 8 ~t lsNs)l-+ 0 ash-+ 0. We can now write (4.1)
as
where the terms in (4.8) are defined in Section 5.6. An approximation Nh(·)
to a Poisson measure can be written in terms of {v~, Pn}. For a Borel set
H in r, define Nh(t, H) by
where
Proof. The direct method of Theorem 1.1 will be used. The sequences
{mh(·), h} are always tight since their range spaces are compact. LetT<
oo, and let iih be an Ff-stopping time which is no bigger than T. Then
for 8 > 0,
E~t lwh(iih + 8) - wh(iihW = 0(8) + eh,
where eh ~ 0 uniformly in iih. Thus, by Theorem 9.2.1, the sequence
{wh(·)} is tight. A similar argument yields the tightness of {Mh(·)}. The
sequence {Nh(·)} is tight (Theorem 9.2.1) because the mean number of
jumps on any bounded interval [t, t + s] is bounded by .Xs + 8?(s), where
8?(s) goes to zero ash--+ 0, and
where
of the type used in Theorem 4.1 has a weakly convergent subsequence whose
limit processes satisfy {4.10). Abusing notation, let the given sequence con-
verge weakly and denote the limit by (x(·), m(·), w(·), N(·), f). Let f3 > 0.
Then, by the weak convergence Theorem 4.1, it is always the case that
and
E';'h e-f3rhg( 1/Jh(Th)) --t E';'e-f3'i' g(x( f)).
It is not always the case that the limit f = r, the first time of escape
of x( ·) from G 0 , analogous to the situation in Section 10.2 and Chapter
9. All the considerations discussed in these sections concerning the conti-
nuity of the exit times also hold here. Using Theorem 4.1 and following
the procedure in Section 10.2 for dealing with the continuity of the first
exit time, we have the following theorem which is one half of the desired
result {5.1). The last assertion of the Theorem 5.1 follows from the weak
convergence, Fatou's lemma and the fact that (using the Skorokhod rep-
resentation of Chapter 9 for a weakly convergent subsequence so that the
convergence is with probability one) liminfh 'Th ~ T. A criterion for the
uniform integrability is given in Theorem 5.2.
liminfVh(x)
h
>
-
V(x). (5.3)
The main idea in the proof is to use the minimality of the cost function
Vh(x) for the Markov chain control problem. Given an "almost optimal"
control for the x( ·) process, we adapt it for use on the chain, and then use
the minimality of Vh(x) and the weak convergence to get (5.4). Note that
r' in (5.6} below is the infimum of the escape times from the closed set
G. It is larger than or equal to T, the escape time from the interior ao.
Condition (5.6) holds if there is some i such that for each x E G,
(5.5}
In fact, the proof shows that {5.6} implies the uniform integrability of {rh}·
Theorem 5.2. Assume {Al.1}- {A1.4} and {A2.1}. Let /3 > 0, and assume
{A2.2). Then (5.4) and hence {5.1} hold. If instead {A2.2'} holds or if the
randomized stopping rule is used, then {5.1} continues to hold.
Let /3 = 0. Then, the assertion continues to hold if {7h} is uniformly
integrable. Definer'= inf{t: x(t) ¢ G}. Assume that there is T1 < oo and
Ot > 0 such that
{5.6}
Then under the other conditions of the /3 > 0 case the conclusions continue
to hold.
Proof. The proof of the first part of the theorem is given only under
{A2.2}, since the proofs under (A2.2') and the randomized stopping rule
are similar. Let /3 > 0 and let f and 0 be as in Theorem 3.1. As noted above,
we only need to prove {5.4). Let (m€(·), w€(·), N€(·)) be an admissible triple
for (1.1), where m•(-) is a relaxed control representation of an ordinary
control which is determined by the conditional distribution on the right
side of (3.11). Let x•(·) denote the associated solution to (1.1). By Theorem
3.1 and {A2.2), we can suppose that (x€(·), m€(·), w€(·), N€(·)) is E-optimal
and that
P::· {f{·) not continuous at x€(·)} = 0.
Let {e~,n < oo} and 1/Jh(·) denote the controlled Markov chain and
continuous parameter interpolation, respectively, for the control law to be
defined below. Similarly, let wh(-) and Nh(·) be defined by (4.5) and {4.9),
respectively, for this chain, and let Th denote the first escape time from
G0 • The (wh(·),Nh(·)) will replace the (w(·),N(·)) in the arguments of
the FnO of (3.11). Because the interpolated process 1/Jh(·) changes values
at random times which might not include the times {no} at which the
control changes in the discrete time approximation which led to {3.11), we
need to alter slightly the timing of the changes of the control values. Let
{r/:, k < oo} denote the jump times of 1/Jh (·) and define
a~ = min{ r/: :r/: ;: : no},
294 10. Convergence Proofs
the first jump time of '1/Jh(·) after or at time no. For each n, we have
We will choose {u~, n < oo} such that uh (·) will be constant on the inter-
vals [a~,a~+l), with values determined by the conditional probability law
(3.11). In particular, fork such that T~ E [n8,n8 + 8), use the control law
u~ = uh(a~) which is determined by the following conditional distribution
at time a~
and
We show first that (5.6) implies that there are T2 < oo and 84 > 0 such
that for small h
(5.12)
Suppose that (5.12) does not hold. Then there are sequences Yh E G, Th ~
oo, and admissible mh(-) and associated processes '1/Jh(·) with initial condi-
tions Yh E G such that
(5.13)
lim inf
h
P;:.h {Th ~ 2T1} 2': P;:; {r' ~ T1} 2': 81
10.5 Convergence of the Costs 295
which implies that E'{;'h Th :::; T2+T2j(84j2) and indeed that for each integer
k, E'{;'(rh)k is bounded uniformly in h,m and x E G. •
Theorem 5.3. Assume the conditions of Theorem 4.1 with the following
exceptions. (i) There are sets (;h c G and compact G such that (;h .).. G
and there is local consistency of the approximating chain except possibly on
{Jh n G~. (ii) There are bounded {b~, ii~} such that for x E {Jh,
Let B(y), A(y) denote the sets of possible values of b(t), a(t), respectively,
when x(t) = y E G. Suppose that the solution to (5.14) does not depend on
296 10. Convergence Proofs
the choices of the "tilde" functions in G within the values allowed by the
sets B(y), A(y). Then the limit does not depend on the choices made in the
sets Gh.
If the conditions of Theorem 5.2 hold, but with the above exceptions, then
the conclusions of that theorem continue to hold.
A6.1. The solution to (6.1) is unique in the weak sense for each initial
condition x E C0 in that if pis an :Ft-stopping time and x(·) is a solu-
tion to (6.1), then the probability law of (w(·), N(·), p) determines the law
of (x(·),w(·),N(·),p). Also, either f(·) is continuous with probability one
10.6 Optimal Stopping 297
The next theorem gives a condition which guarantees that we need only
consider stopping times whose moments are uniformly bounded.
Theorem 6.1. Assume {A2.1} and {Al.1}- {Al.3} without the control,
and let inf:ceG k(x) = ko > 0. Assume (A6.1). Then there exists an optimal
stopping time p and
E:cP;::; 2 max lg(y)i/ko. (6.3}
yEG
For the continuous parameter chain ,ph ( ·) and a stopping time Ph, the
analogous cost is
Let Vh(x) denote the optimal value function. Then the dynamic pro-
gramming equation for both the discrete and continuous parameter chain
298 10. Convergence Proofs
problem is
Proof. We work with the first set of conditions only. The proof uses an
approximation procedure as in Theorems 4.1, 5.1, and 5.2. Let ('lj;h(·),ph)
denote the continuous parameter approximating chain and its optimal stop-
ping time, respectively, and define wh(-) and Nh(·) as in (4.5) and (4.9),
respectively. The sequence
('1/Jh( ·), wh(· ), Nh(·), Ph)
is tight and we can assume that the Ph satisfy the bound in (6.3) for all h
and x E G0 . By use of the Markov property, as at the end of Theorem 5.2,
it can be shown that this boundedness implies that lim suph Ex(Ph)k < oo
for any positive integer k. Thus, the sequence of stopping times is uniformly
integrable. Let (x(·),w(·),N(·),p) denote the limit of a weakly convergent
subsequence. Then, analogously to the situation in Theorem 4.1, (6.1) holds
for the limit processes and there is a filtration Ft such that w( ·) is an
Ft- Wiener process, N ( ·) is an Ft- Poisson measure, p is an Ft -stopping
time and x( ·) is adapted to Ft. By the uniform integrability and the weak
convergence,
Wh(x, Ph)= Vh(x)-+ W(x, p) 2: V(x).
To get the reverse inequality, we proceed as in Theorem 5.2 and use a
"nice" €-optimal stopping rule for (6.1) and apply it to the chain. Then
a weak convergence argument and the fact that Vh(x) is optimal for the
chain yields the desired reverse inequality. Let f > 0. First note that there
are 0 > 0 and T < oo such that we can restrict the stopping times for (6.1)
to take only the values {no, no :s; T} and increase the cost (6.2) by at most
f. Let p, be an optimal stopping time for (6.1), (6.2) with this restriction.
Proceeding as in Section 10.5, we can assume that this €-optimal stopping
time is defined by functions Fn(·) which are continuous in thew-variables
for each value of the other variables and such that the probability law of
p, is determined by P{p, = 0} and, for n > 1,
P {p, = nolx, w(s), N(s), s :S: no, p, >no- 0}
= Fn (x,w(p(}),N(p(},q),j :S: q,pB <no),
10.6 Optimal Stopping 299
+ 1t [
(1.1)
q(x(s-),p)N(dsdp) + z(t),
where m( ·) is an admissible control and the reflection term satisfies
z(t) = 1 t 0
W(x, m) = E;" 1L 00
e-i3s [k(x(s), a)m 8 (do:)ds + c'(x(s))dz(s)], (1.3a)
where we suppose that c'(x)l' 2 0 for all')' E r(x) and all x. If G is a convex
polyhedron where z(·) has the representation (5.7.3):
z(t) = L riYi(t)
i
W(x, m) = E;" 1L 00
e-i3s [k(x(s), o:)m 8 (do:)ds + c'dY(s)]. (1.3b)
11.1 The Reflecting Boundary Problem 303
For the polyhedrol G case, where the representation (5.7.3) holds, (1.4) and
(1.4') hold withY(·) replacing z(·).
Proof. Inequality (1.4') follows from (1.4) and the last assertion follows
from the conditions on the boundary and (1.4) and (1.4'). We start by
using the upper bound on the variation given in [42, Theorem 3.4], with
a different notation. For any set D and o > 0, define the neighborhood
N6(D) = {x : infyED Jx- yJ < o}. By the proof of the cited theorem,
our conditions on the boundary 8G and reflection directions in conditions
(i)-(v) of Section 5.7 imply the following: There are o > 0, L < oo, open
sets Di, i ~ L, and vectors Vi, i ~ L, such that UiDi :J G, and if x E
N6(Di) n 8G, then v~r > ofor all r E r(x).
The jump and drift terms are unimportant in the proof and we drop
them henceforth. Then, write the simplified (1.1) as
Fix T < oo. Define a sequence of stopping times f3n and indices in recur-
sively as follows: Set f3o = 0. Let io be such that R(O) = x(O) E Dio· Set
{31 = inf{t: x(t) ¢ N6(Di 0 )}. Define i1 such that x(f31) E Di 1 • Continue in
the same way to define all f3n and in. By the definition of the f3i,
lx(f3i)- x(f3i-l)l ~ oj2.
By the proof of [42, Theorem 3.4], the conditions in the first paragraph
imply that
V~m-l(x(f3m)- x(f3m-d)- V~m-l(R(f3m)- R(f3m-1))
(1.6)
~ o(lzl(f3m) -lzl(f3m-d).
304 11. Convergence Proofs Continued
Define NT= min{n: f3n ~ T}- 1. Then, using the fact that the x(t) are
uniformly bounded, there is <5 1 < oo such that
NT
lzi(T) $ 81NT- L v:m-1 [R(f3m 1\ T}- R(f3m-ll\ T}] + <5tlx(T)- xi,
n=l
(1.7}
where x = x(O}. Then, using the boundedness of x(t}, there are 82 < oo
and a nonanticipative and bounded process a(·) such that
+82~lx(T)-x1 2 . (1.8}
We need only estimate NT and the right hand term in (1.8}. There is
<5a < oo such that
P;n{sup IR(s)l ~ 8/4} ::; 16E~IR(T}I 2 /8 2 ::; 8aT. (1.9a}
s~T
An argument analogous to the last part of the proof of Theorem 1.8 below
implies that there is f 1 {-} such that
where ft(T) goes to zero as T ---+ 0. Via (1.9a,b) and the boundedness of
G, we have
(1.9c)
m,z
where f2(T) goes to zero as T---+ 0. Now given f small and positive, letT
be small enough such that ft (T) + <5aT $ f. Then
which yields that supE~(NT)k ---+ 0 as T---+ 0 for any k < oo. This last
m,z
fact, together with (1.9c} and (1.8} and yields (1.4). •
control sequence. Define the processes wh(·) as in (10.4.5). Recall the repre-
sentation (5.7.5) of the continuous parameter Markov process ,ph(·) which
is appropriate for the problem with a reflecting boundary, and which we
now rewrite in relaxed control notation
and Eldf{t)l ---+ 0, uniformly in t in any bounded set. The cost functions
which we use for the Markov chain interpolation are those in (5.8.19), and
we rewrite them here as
E;h 1 00
e-.Bs [k('l/;h(s ), uh(s) )ds + c' (,Ph(s) )dzh(s) J + eh
E';:h 1L 00
e-.Bs [k(,Ph(s),a)mh(dads)
+c'('l/Jh(s))dzh(s)] + eh,
(1.12)
with the analogous form for {1.11b).
The error term eh is due to the approximation of the states on act by
the previous state inCh. By (5.7.4), it satisfies
Define
n-1
t~ = L: t:.tih.
i=O
Define the stretched out time scale Th(·) as follows: Th(O) = 0, the deriva-
tive of Th(·) is unity on [f~,f~+l] if step n is not a reflection step (i.e.,
~~ E Gh), and is zero otherwise. Thus, in particular,
Th(f~) = r~. (1.14)
The new time scale is illustrated in Figure 11.1, where the ~~ and ~i are
in act. Now define the rescaled or "stretched out" processes (denoted as
the "hat" processes) ,j;h(·),uh(·), etc., by
,j;h(t) = 1/Jh('fh(t)), uh(t) = uh('i'h(t)), m(da, t) = m(da, 'i'h(t)),
etc. The time scale is stretched out at the reflection steps by an amount
equal to the absolute value of the conditional mean value of the increment
t:.~~. namely by lt:.z~l· For the case in the figure, izhi(t) = 0 until rf, then
it jumps by lt:.z~l + lt:.zil, then it is constant until the next reflection step.
h
5
Note that
h "h
m (dads)= mi'h(s)(da)dT (s).
A
1t a(~}(s))dTh(s) + t5~(Th(t))
(where Esup 8 ~Th(t) lt5~(s)l-+ 0 ash--+ 0), and (5.7.6) becomes
Theorem 1.2. Assume the conditions of Subsection 11.1.1 and let {u~, n <
oo} be an admissible control sequence for the approximating Markov chain.
Then the sets of processes
h h h h ·h "h
Q (·) = {m (·),w (·),N (·)}, {z (·),81 (·)}
are tight. The 81 (-) and z (·) converge weakly to the zero process. Let h
"h ·h
The pair (w(·), N(·)) are a standard Wiener process and Poisson mea-
sure, respectively, with respect to the natuml filtmtion, and m( ·) is ad-
missible. Also, i:(t) E G. Let Ft denote the a-algebm which is genemted
11.1 The Reflecting Boundary Problem 309
J(t) = kt q(x(s-},p)N(dsdp).
The limit processes satisfy
The process z(-) can change only at those t for which x(t) E 8G. It is
differentiable and the derivative satisfies
! z(t) E r(x(t)).
Remark on the Proof. The proof parallels that of Theorem 1.1. It uses
the validity of (1.6) for 1/Jh(-) and zh(·) (for small h) and the fact that for
any f > 0 and sequence {mh(·)} of admissible controls,
The latter fact can be proved by an argument for the "stretched out" or
"hatted" processes of the type used in the last part of Theorem 1. 7 below.
Then, T(·) is right continuous and T(t) --1- oo as t --1- oo, with probability
one. For any process ¢(·), define the "inverse" ¢(t) = J>(T(t)), and let
:Ft denote the minimum a-algebra which measures {\lt(s),s ~ t}. Then
w(·) and N(·) are a standard :Ft- Wiener process and Poisson measure,
respectively. Also, m(·) is admissible with respect to (w(·),N(·)) and (1.1)
and (1.2) hold.
Proof. Inequality (1.23) implies that T(t) --1- oo with probability one as
t --1- oo. Thus, T(t) exists for all t and T(t) --1- oo as t --1- oo with probability
one. By (1.20) and (1.21) we also have
EH(x(ti), w(ti), (¢i, m)t;, N(ti, q), z(ti),j :-::; q, i :-::; p)
x [w(t + u)- w(t)] = 0,
EH(x(ti), w(ti}, (¢i, m)tp N(ti, q), z(ti),j ~ q, i ~ p)
x [w(t + u)w'(t + u)- w(t)w'(t) - ul] = 0.
Thus, w( ·) is an :Ft- Wiener process. We omit the details concerning the
fact that N ( ·) is an :Ft- Poisson measure. It follows that m( ·) is admissible
with respect to (w( ·), N ( ·)), using the filtration :Ft. Finally, a rescaling in
(1.17) yields that (1.1) and (1.2) hold. •
The Limits of the Cost Functions. The next theorem shows that the
costs Vh(x) and Wh(x, mh) converge to the costs for the limit processes
and that
liminfVh(x)
h
>
-
V(x). (1.24)
or of
(1.27b)
according to the case of interest. The integrability properties of (1.27) imply
that fh(s) ---too ass---too, with probability one. Thus, the cost (1.12) can
be written as
loT e-.ai'h(s)c'(.,j;h(s-))dzh(s)
can be arbitrarily well approximated (uniformly in h) by a finite Reimann
sum for which the number of terms does not depend on h. These facts, the
uniform integrability properties of (1.27), and the weak convergence, imply
that Wh(x, mh) converges to
Al.l. For each € > 0 and initial condition of interest, there is an €-optimal
solution (x(·),z(·),m(·),w(·),N(·)) to {1.1), {1.2) which is unique in the
weak sense. That is, the distribution of (m(·),w(·),N(·)) implies that of
(x(·),z(·),m(·), w(·),N(·)).
11.1 The Reflecting Boundary Problem 313
A1.2. Let u(·) be an admissible ordinary control with respect to (w(·), N(·)),
and suppose that u( ·) is piecewise constant and takes only a finite number
of values. Then, for each initial condition there exists a weak sense solution
to (1.1) and (1.2), where m(·) is the relaxed control representation ofu(·),
and this solution is unique in the weak sense.
A1.3. For the case where G is a convex polyhedron and the representation
z(t) = Li riYi(t) is used only: Either (a) or (b) holds. (a) The covariance
a(x) is nondegenerate for each x. (b) Let ci > 0 so that there is a positive
cost associated with boundary face 8Gi. Then at each edge or corner which
involves aai, the set of reflection directions on the adjoining faces are
linearly independent.
There is(}> 0 and a partition {q,j ~ q} ofr such that the approximating
uE(-) can be chosen so that its probability law at any time nd, conditioned on
{wE(s),NE(s),s ~ nd,u~(i8),i < n}, depends only on the initial condition
x = x(O) and on the samples
and is continuous in the x, w~(p(}) aryuments for each value of the other
aryuments.
Theorem 1.7. Under the other conditions of this section and (A1.1)-
(A1.3), (1.25) holds.
Proof. The proof is similar to that of Theorem 10.5.2. Use the assumed
uniqueness conditions and the uniform integrability of
(which follows from Theorem 1.1) to get a comparison control of the type
used in Theorem 10.5.2. Then use the weak convergence results of Theorems
1.2, 1.4 and 1.5. •
314 11. Convergence Proofs Continued
Proof. Define Th(t) = inf{s : fh(s) > t}. Suppose that T(t) exists for
each t and T( ·) is continuous, with probability one. Then {Th (·)} must be
tight and the weak limit must equal T( ·) with probability one. Then, since
Thus, we need only prove the existence ofT(t) for each t and the continuity
ofT(·).
For the rest of the proof, we drop the jump term J(·), because we can
always work "between" the jumps. Suppose that the inverse T(t) does not
exist for all t. Then, loosely speaking, there must be a sequence of intervals
whose length does not go to zero and such that fh(·) "flattens" out on
them. More particularly, there are p0 > 0, Eo > 0, t 0 > 0, and a sequence of
random variables {vh} such that for all E > 0
with limit
(x(-), m(-), w(·), .. .).
Then on a set of probability greater than p0 , we have
Thus, on this set dx(t) = dz(t) on [0, to] and x(t) E aG. This violates the
conditions on the reflection directions. In particular, the conditions on the
boundary and reflection directions (i)-(v) of Section 5.7.3 imply that x(·)
cannot remain on the boundary on any time interval on which z( ·) is not
constant. The possible values of z( s) on that interval would force x( ·) off
the edge. Thus, T(·) exists for all t < oo with probability one. The same
proof can be applied to yield the continuity ofT(·). •
where the vi are given vectors and the pi ( ·) are real valued processes. The
interpolation '1/Jh (·) can be represented as
the terms Mh(t), Jh(t), zh(t), <>?(t) are as in Section 5.7, and frh(t) satisfies
(8.3.4). The cost function for the approximating chain is taken to be (8.3.5)
316 11. Convergence Proofs Continued
Theorem 2.1. Assume the conditions of this section and let EIF(T)I 2 < oo
for each T < oo. Then
Theorem 2.2. Under the assumptions of this section and the condition
that
limsupsupEIFh(n + 1}- Fh(n)l 2 < oo, (2.2}
h n
we have
In Section 11.1, it was not a priori obvious that the reflection terms
{zh(·)} were tight. We dealt with that problem by use of a stretched out
time scale. But in Theorem 1.8, we showed that the {wh(·)} actually was
tight. The situation is more complicated here, because the control sequence
{Fh(·)} can always to be chosen such that neither it nor the associated
sequence of solutions is tight. The time rescaling method of Section 11.1
still works well. In order to use it, we will have to redefine the rescaling
to account for the singular control terms. Recall the trichotomy used in
Chapter 8, where each step of the chain {e~, n < oo} is either a control step,
a reflection step, or a step where the transition function for the uncontrolled
and unreflected case is used (which we call a "diffusion" step, even if there
is a "Poisson jump"). Redefine fl.f~ by
Recall the definition of'fh(·) which was used above (1.14). Redefine 'fh(·)
such that its slope is unity on the interval [f~, f~+l] only if n is a diffusion
step [that is, if~~ E Gh and no control is exercised], and the slope is
zero otherwise. Now, redefine the processes ,j;h( ·),etc., with this new scale,
analogously to what was done in Section 11.1. In place of (1.15) we have
Theorem 2.3. Assume the conditions of this section. Then the sets of
processes
h h
Q (·)={w (·),N
h
0}, { 'h : :. h
c5 1 (·),Y
: :. h
(·),U (·),F
::..h
0}
are tight. The third set converges to the zero process. Let h index a weakly
convergent subsequence of {~h(·),Qh(·)} with limit
Thew(·) and N(·) are the standard Wiener process and Poisson measure,
respectively, with respect to the natural filtration, and x(t) E G. Let Ft
denote the a-algebra which is generated by {~(s), s:::; t}. Then w(T(t)) =
w(t) and is an Ft-martingale with quadratic variation J~ a(x(s))dT(s).
Also,
}(t) =lot q(x(s-), p)N(dsdp).
and
The process f'i(-), [respectively, {Ji(·)] can change only at those t for which
xi(t) = 0 [respectively, xi(t) = Bi]. lf(2.2) holds, then Theorem 1.4 contin-
ues to hold and the rescaled processes satisfy (8.1.22') with the jump term
added.
The limit of the costs can be dealt with via the following theorem.
318 11. Convergence Proofs Continued
Theorem 2.4. Assume the conditions of this section and (2.2), and let
h index a weakly convergent subsequence of {~h(·),Qh(·)}. Then, with the
other notation of Theorem 2.3 used,
(2.6b)
A2.1. For each f > 0 and initial condition of interest, there is an f - optimal
solution (x(·),F(·),w(·),N(·)) to (8.1.22') with the jump term added, and
it is unique in the weak sense.
Theorem 2.5. Assume the conditions of this section and (A2.1}, (A2.2).
Let f > 0. There is an t:-optimal admissible solution (x(·), z(·), F(·), w(·),
N(·)) to (8.1.22') (with the jump term added) with the following properties.
(i) There are Te < oo, 8 > 0, 0 > 0, km < oo, and p > 0, such that F( ·) is
constant on the intervals [n8, n8 +8), only one of the components can jump
at a time, and the jumps take values in the discrete set kp, k = 0, ... , km.
Also, F(·) is bounded and is constant after time T •. (ii) The values are
determined by the conditional probability law (the expression defines the
11.2 The Singular Control Problem 319
Theorem 2.6. Under the conditions of this section, (A2.1) and (A2.2),
Vh(x) -t V(x).
Proof. We need to adapt the €-optimal control of Theorem 2.5 for the
approximating Markov chain. In preparation for the argument, let us first
note the following. Suppose that we are given a control of "impulsive mag-
nitude" vidpi acting at a time t 0 • Let the other components Fi(·),j =f i, of
the control be zero. Thus, the associated instantaneous change of state
is vidFi. We wish to adapt this control for use on the approximating
chain. To do this, first define nh = min {k : r/: ~ to}. Then, starting at
step nh of the approximating chain, we approximate vidpi by applying
a succession of admissible control steps in conditional mean direction vi,
each of the randomized type described in Section 8.3. In more detail, let
E~ ae~ = Vi6.P~·i' k ~ nh, denote the sequence of "conditional means," as
in Section 8.3. Continue until Lk 6.P~,i sums to dPi (possibly modulo a
term which goes to zero ash -t 0). There might be some reflection steps
intervening if G is ever exited. Let FC(-) denote the continuous parameter
interpolation of the control process just defined. Because the interpolation
interval at a control or reflection step is zero, all of the "conditional mean"
jumps vi6.P~,i occur simultaneously in the interpolation, and the sequence
{PC(·)} is tight. Also, fl'h(·) converges weakly to the zero process. Thus,
the weak limit is just the piecewise constant control process with a single
jump, which is at time to and has the value vidpi.
With the above example in mind, we are now ready to define the adapted
form of the €-optimal control P(·) given in Theorem 2.5. Let ph(-) denote
the continuous parameter interpolation of this adaptation. ph (·) will be
defined so that it is piecewise constant and has the same number of jumps
that P(·) has (at most Te/0). Each of the jumps of each component of the
control is to be realized for the chain in the manner described in the above
paragraph. The limit of the associated sequence (,Ph(·), zh(·), ph(·), wh(·),
Nh(·)) will have the distribution of the (x(·), z(·), P( ·), w(·), N(·)) of The-
orem 2.5. The non zero jumps for the approximating chain are to occur as
soon after the interpolated times no as possible, analogous to the situation
in Theorem 10.5.2.
320 11. Convergence Proofs Continued
Next, determine the jumps dF~,i of the control values for the adaptation
of F( ·) to the approximating chain by the conditional probability law:
P{ dF::,a = kp, dF::,b = 0, b # alx, Fh(a?), i < n, 1/Jh(s), wh(s), Nh(s),
Uh(s),Lh(s),s :S n<5}
Theorem 3.1. Assume the conditions of this section and {A3.1). Then
Theorems 1.1 to 1.4 and 1.8 hold. Let h index a weakly convergent subse-
quence of {wh(·)} with limit denoted by 'll(·). Then 'll(·) satisfies {1.1) and
{1.2) and the distribution of
does not depend on t. This and the weak convergence {Theorem 1.8) yield
the theorem. •
ih(x) !h(x,uh)
Euh f01 Uu k('lj;h(t), a)mf(da)dt + c'('lj;h(t))dzh(t)]
--+ Em f01Uu k(x(t),a)mt(da)dt + c'(x(t))dz(t)]
limr ~Em J: Uu k(x(t), a)mt(da)dt + c'(x(t))dz(t)]
1(m) 2: 1,
(3.2)
where 1(m) is the cost for the limit stationary process. The same develop-
ment holds for (1.3b).
As in the previous sections, to complete the proof that
(3.3)
A3.2. For each E > 0, there is a continuous feedback control ue(-) which
is £-optimal with respect to all admissible controls, and under which the
solution to (1.1) and (1.2) is weak sense unique for each initial condition
and has a unique invariant measure.
Theorem 3.2. Assume the conditions of this section, (A3.1) and (A3.2).
Then (3.3) holds.
11.3 The Ergodic Cost Problem 323
Proof. We use the same conventions concerning the chains and the initial
state x as given above (3.2), except that the u£(-) of (A3.2) replaces u;h(·).
Let {e~, n < oo} and ,ph (·) denote the stationary chains under the control
u£(·). Then, the weak convergence (Theorem 1.8), the uniform integrability
of (1.27), and the stationarity of the limit, yield that, for (1.3a),
with the analogous result for (1.3b), which yields the theorem. •
chapters.
Section 12.7 concerns the problem of numerical approximation for the
nonlinear filtering problem. It is shown how the previous Markov chain ap-
proximations can be used to get effective numerical approximations under
quite weak conditions. For simplicity of notation, the dynamical terms and
cost rates will not depend explicitly on time. The alterations required for
the time dependent case should be obvious. The method is equivalent to
using an optimal filter for an approximating process, which is the approxi-
mating Markov chain, but using the actual physical observations. Conver-
gence is proved. The general idea is quite versatile and has applications
to other problems in "approximately optimal" filtering [102, 98, 17, 18].
Each step of the method is effectively divided into two steps: updating
the conditional probability via the dynamics, and then incorporating the
observation.
Example. LetT < oo and foro > 0, let N 0 = Tjo be an integer. Let
the system x(·) be defined by (5.1.20), and let k(·) and g(·) be smooth and
bounded real valued functions. As in Section 5.1, T, k{·),and g(·) play only
an auxiliary role in the construction of the approximating chain. For t < T
and a feedback control u(·), define the cost function
2
/(x)
raf()=b(
J, X X,O:
)af(x)
OX + !2()8
20' X (Jx2 •
Substituting (1.2) into (1.1), letting Wh· 8 (x, t, u) denote the solution to
the finite difference equation with x an integral multiple of h and n8 < T,
collecting terms, and multiplying all terms by 8 yields the expression
2
-h8
+ W h '0 (x + h,n8 + 8,u) [a- 2(x) + 8]
2 + b (x,u(x,n8))h
[ 1-a
2 8
(x)h 2 -lb(x,u(x,n8))1h8]
of the Wh• 8 (x,n8 + 8,u) term is nonnegative. Then the coefficients can
be considered to be the transition function of a Markov chain. Defining
ph• 8 (x, yla) in the obvious way, we then rewrite (1.3) as
Wh· 8 (x, n8, u) = :~::>h· 8 (x, ylu(x, n8))Wh• 8 (y, n8 + 8, u) + k(x, u(x, n8))8.
y
(1.4)
12.1 Explicit Approximations: An Example 329
The ph•6(x, yia) are the transition probabilities for a controlled Markov
chain. Let the associated Markov chain be denoted by {e~· 0 ,n < oo}. Note
that
b(x, a)o,
covh,a !J.t:h,o
x,n ~n u 2 (x)o + O(M).
Let 0--+ 0 and h--+ 0 together. By analogy to the definition (4.1.3), we say
that the "explicit" controlled Markov chain {~~· 6 , n < oo} with interpola-
tion interval o> 0 is locally consistent with x( ·) if
Remarks. The example shows that one can treat the fixed terminal time
problem quite similarly to the unbounded time problem. Notice the fol-
lowing important point. Let ph(x, yia) be the transition probabilities in
Section 5.1, Example 4. Then, for y -=f. x,
Ph,tS(x,yla) h
1- ph•tS(x, xla:) = p (x, yla:). (2.1)
That is, under the control parameter a, and given that e~· 6 = x, the prob-
ability that e:~l = y ::/= X equals ph(x, yla:). Let <5 --+ 0 and h --+ 0 to-
gether. Then, the requirement that both pairs (ph(x, yla:), ~th(x, a)) and
(ph· 6 (x, yla:), c5) be locally consistent implies that, modulo small error terms,
h6 <5
l-p' (x,xla:) = ~ h( ) E (0,1]. (2.3)
t x,a
12.3 Implicit Approximations: An Example 331
we get the transition probabilities ph• 6 for the explicit method and the as-
sociated approximating Markov chain {{!·6 ,n < oo} from (2.1) and (2.3).
f(x, t + 5) - f(x, t)
ft(x,t) --+
5
fx(x,t) --+
f(x + h, t) - f(x, t)
if b(x, u(x, t)) ~ 0,
h
(3.1)
f(x, t) - f(x- h, t)
fx(x, t) --+ if b(x, u(x, t)) < 0,
h
f(x + h, t) + f(x- h, t) - 2f(x, t)
fxx(x, t) --+ h2
Note that the last three equations of (3.1) use the value ton the right side
rather than t +8 as in (1.2). Using (3.1) and repeating the procedure which
332 12. Finite Time and Filtering Problems
led to (1.4}, for no< T we get the standard finite difference approximation
=
2 o
[ -a 2(x) + ho o]
-h2 +b (x,u(x,no))h W • (x+h,no,u)
a (x) o
2 _ o] h o
+ [ -2- h2 + b (x, u(x, no))h w ' (x- h, no, u)
It can be seen from this that we can consider the ph,o as a one step transition
probability of a Markov chain {(~· 6 , n < oo} on the "( x, t) -state space"
{O,±h,±2h, ... } X {O,o,28, ... }.
It is evident that time is being considered as just another state variable.
Refer to Figure 12.2b for an illustration of the transitions. Note that, anal-
ogously to {1.6), for x =F y we have
ph•6 (x,no;y,noja) =ph(x,yia) x normalization(x), (3.3)
where the ~(x, yia) are the transition probabilities of Example 4 of Section
4.1. Analogously to what was done in Section 12.2, this relationship will be
used in Section 12.4 to get a general implicit Markov chain approximation,
starting with any consistent (in the sense of Chapter 4) approximation.
Write (~· 6 = ((~· 6 ,(:;g),where the 0-th component (:;g
represents the
time variable, and (~· represents the original "spatial" state. Then we have
6
h ph• 6(x,no;y,nolo:)
P (x, Ylo:) = 1- p'
Ah 6( x,no;x,no +o I0: )" (4.1)
This will be done via the local consistency requirements on both (ph(x, ylo:),
fl.th(x, a:)) and the corresponding quantities for the implicit approximation.
The conditional mean one step increment of the "time" component of (~· 6
is
Eh,a fl.1"h• 6 = PAh·6(x ' no·' X ' no+ olo:)o'
z,n ~n,O
{4.2)
and we define the interpolation interval fl.ih· 6 (x, a:) by (4.2). Of course, we
can always add a term of smaller order. The consistency equation for the
spatial component ((~· 6 ) of the chain is (modulo a negligible error term)
Let u = { u~· 6 , n < oo} be an admissible control sequence for {(~· 6 , n < oo}.
Define flfh,6 6 ~n 6' uh·
n = Llih• (rh,
6
n )' and let Eu
x,n denote the expectation given
use of u and (~· 6 = x, (~·g = n8. Then the solution to (3.2) with the
boundary condition Wh· 6 (~, T, u) = g(x) can be written as
(4.5)
Next, define the "interpolated" times i~· 6 = E~- 1 ti£~· 6 , and define the
continuous parameter interpolations (h· 6(-) = ((h· 6(.), (~· 6 (.)) and uh,6 (-)
by
uh•6(t) = u:• 6, (h• 6(t) = (~· 6 fortE [£!• 6 ,£:· 6 + 6£:• 6). (4.7)
Then, fort= no, we can write (4.5) and (4.6) as, respectively,
Vn-
_ mm
· {'~ > lln-1 .· '>i,O rh,6 _ uJ>} ·
rh,6 - '>i-1,0-
U(x) = LPh·6 (x, nO; y, no)U(y) + ph• 6 (x, no;x,nt5 + o)g(x). {4.8)
y
The Cost Function. The system will be the controlled diffusion (5.3.1)
or the jump diffusion (5.6.1). All of the problem formulations of Chapters
336 12. Finite Time and Filtering Problems
10 and 11 can be carried over to the finite time case and the proofs require
no change, given the appropriate locally consistent transition probabili-
ties. However, for illustrative purposes, we next set up the computational
problem up for the case of an absorbing boundary. Assume:
for x E G0 and t < T, with the boundary conditions V(x, t) = g(x, t), x fl.
G0 ,ort;:::T.
for x E G~, n8 < T, and with the same boundary condition as for (5.2).
Solving the Explicit Equation {5.3). Because the minimum cost func-
tion values Vh· 6(y, no+ 8) at time no+ o are on the right side of (5.3),
while the value at state x at time n8 is on the left side, (5.3) is solved by
a simple backward iteration.
Solving the Implicit Equation (5.4). In {5.4), if the time on the left is
n8 and the state there is x, then the only value of the minimum cost function
at time n8 + o which is on the right side is Vh· 6(x, n8 + 8). The Vh,6(y, n8)
on the right, for y =f. x, are evaluated at time n8. Hence, {5.4) cannot
be solved by a simple backward iteration. However, all of the methods of
Chapter 6 for solving (6.1.1) or {6.1.2) can be used. The general approach
is as follows: Let No = T. Starting with time N, we have the boundary
condition Vh· 5 (x, No)= g(x, N8). Suppose that Vh• 6(x, no+8) is available
338 12. Finite Time and Filtering Problems
for some n < N. Then there is a C(-) such that we can rewrite (5.4) as
for all a and x. This and the continuity of the transition probabilities
implies that the effective transition function in (6.1) is a contraction for all
feedback controls, and that the contraction is uniform in the control. Now
simply apply to (6.1) any of the methods of Chapter 6 which can be used
for (6.1.1) or (6.1.2).
----~r-,-~--,-~------------------~a
------r-~-+--r-;-----------------~b
------L-~~L-~~--r---------------~C
t- T
Figure 12.1. The decomposition regions.
Now the computational procedure is as follows: For given n, let the val-
ues Vh· 6(x, nt5 + t5) be given for all x. Calculate Vh• 6(b, nt5) in terms of
these given values via the explicit method. Then, at time nt5, the values of
Vh· 6 (x, nt5) are known, except for x E G~ n (a, b) and G~ n (b, c) and the
two halves can be worked on simultaneously via the implicit method.
The sequence Vh,.S (x, nt5) can be shown to converge to the correct value
V(x, t) ash -t 0, 8 -t 0, and n8 -t t. Clearly, more than one subdivision of
the state space can be used and the procedure can also be used in higher
dimensions. We have no actual computational experience with the method.
See also [96] for an analysis of decomposition algorithms from the point of
view of controlled Markov chains.
Let C* denote the formal adjoint of the differential operator C of x( ·). Under
appropriate regularity conditions [49, 54, 60], x(t) has a density p(·, ·) and
it is the solution to the Fokker-Planck or "forward" Kolmogorov equation
( t)
Pt y,
= C* p(y, t) =!"
2 L.J
82 (aii(y)p(y, t)) _ " 8(bi(y)p(y, t))
8 -8 · L.J 8 · '
(7 1)
.
i,j y, YJ i Yt
with the initial condition p(y,O) = o(x- y), where o(·) is the Dirac delta
function; that is, x(O) is concentrated at the initial point x. There is an
analogous (partial differential integral) equation for the jump diffusion case
For many problems of interest, (7.1) has only a formal meaning, because
either the density does not exist or it is not smooth enough to satisfy (7.1).
Henceforth, when referring to the density, we mean the weak sense density
or (equivalently) the distribution function. We next discuss the problem
of the approximate calculation of the density. For numerical purposes, it
is usually necessary to work in a bounded state space. The state space G
might be bounded either naturally or because a bound was imposed for
numerical reasons. For example, the process dynamics can be such that the
process never leaves some bounded set, even without reflection or absorp-
tion: The process might be reflected back from the boundary of G, or it
might be stopped or killed once it leaves the interior G0 • The exact form
of the process is not important here, provided only that we work with ap-
propriately consistent approximations. Because the process is uncontrolled,
12.7 Nonlinear Filtering 341
we drop the control parameter in the notation for the approximating tran-
sition probability ph,lJ. For purposes of simplicity in the exposition and
because the boundary condition plays only a secondary role, we make the
assumption in the general form (A7.1) below. The details can be filled in by
referring to the weak convergence results for the various cases of Chapters
10 and 11.
Let ph,lJ (x, n~, y) denote the n-step transition function. Then, as (~. h) -t
0 and n8 -t t ~ 0, we have
ph,lJ(x,n8,·) -t p(x,t,·)
in the sense that for any bounded and continuous real valued function if>(-},
Then the optimal nonlinear filter for an approximating process will be de-
fined, and finally the numerical method will be given. The development is
an extension and simplification of that in [90].
For¢(·) an arbitrary bounded and continuous real valued function, define
the conditional expectation operator Et by
Et¢(x(t)) = E[¢(x(t))!Y(t)].
One of the most important results in the theory of nonlinear filtering is the
so-called representation theorem, which is a "limit" form of Bayes rule. We
will use it in the form which was used in the original derivation of the filter
[87], which is particularly convenient for the types of approximation and
weak convergence methods which will be used. That reference derived the
result for the diffusion case. The pure jump process (no diffusion compo-
nent) case was first derived by [136] and [156]. More modern developments
via measure transformation and martingale techniques are in [51, 116]. but
the representations obtained by those techniques are the same as used here
for the processes of concern here.
Let i(·) be a process with the same probability law as x(·) has, but which
is independent of (x(·), y(·)). Define
Except for a few special cases, the evaluation of (7.5) is not a finite
calculation. The best known cases where the calculation is (essentially)
finite are (i) the Kalman-Bucy filter, where the process is not reflected or
killed, g( ·) and b( ·) are linear functions of x, q( x, 'Y) = 0, and a(·) does not
depend on x and (ii) x(·) is a finite state Markov chain. Numerical methods
for the approximate evaluation of the conditional distribution were given
in [90] and related variations were in [39] and [92]. Robustness (locally
Lipschitz continuity) of the numerical approximations with respect to the
observation process was shown in [92] and this property is enjoyed by the
algorithm to be described in Subsection 12.7.3.
It is important to keep in mind that all approximations to the nonlin-
ear filtering problem are actually approximations to some representation
of Bayes rule. Other approximations to Bayes rule for the problem at hand
might lead to preferable procedures in particular cases. The procedures to
be described below are of the same type as in these references, but can use
more general approximating processes. The basic idea depends on two facts.
First, if the signal x( ·) were a finite state discrete time Markov chain, then
12.7 Nonlinear Filtering 343
The Optimal Filter for a Markov Chain Signal Process. Let {~n, n <
oo} be a finite state Markov chain with one step transition probabilities
p(x, y). Let v be a positive real number and suppose that {1/Jn, n < oo} is a
sequence of mutually independent normally distributed random variables
with mean zero and covariance vi, and which is also independent of the
{~n,n < oo}. Suppose that we observe the white noise corrupted data
Yn = g(~n) + 1/Jn at time step n, for some bounded function g(·). Define
Yn = {Yi, i ~ n} and the conditional distribution
We now use Bayes rule to define a convenient recursive formula for Qn (x).
Let the expression P{Ynl~n = x} = P{ynl~n = x,~n-1 = y} denote the
conditional (normal with mean g(x) and covariance vi) density of the ob-
servation at the value Yn· Note that
1
Eyexp [ vg(x)'Yn- 1
2VIg(x)l 2 ] p(y,x)Qn-1(Y)
-
(7.6)
normalization
where, in both cases, the normalization is just the numerator summed over
x.
The Optimal Filter for the "Explicit" Chain {~~· 6 ,n < oo} of
Sections 12.1 and 12.2. Let us specialize the result (7.6) to the chain
introduced in Sections 12.1 and 12.2. Thus, ~n is replaced by ~~· 6 , and the
344 12. Finite Time and Filtering Problems
where q~· 6 (x) = Q~· 6 (x) and q~ 6 (x) equals Q~ 6 (x) times a normalizing
factor which depends only on the data Y!:; 6 • Note that (7.8) can be divided
into the two steps: first update the effects of the dynamics as in
~.::>h,6(y, x)q~~~ (y),
y
The Filter for the "Implicit" Method. Recall the definition of the
chain {(!· 6} and its one step transition probabilities ph• 6(x, y) defined at
the end of Section 12.4. To get the filter for the implicit method simply
replace theph• 6(x,y) in (7.7), (7.8), or (7.9) by ph•6(x,y).
12. 7.3 The approximation to the optimal filter for x(·), y(·)
The numerical approximation to the optimal filter (7.5) is just either (7.7),
(7.8), or (7.9) with the actual physical obseruations y(no+o) -y(no) used in
place of ~y~· 6 . Both (7.7) and (7.8) provide recursive formulas which can be
used for the actual computation, and one representation for the recursion
is in [90, pp.132-133]. The initial condition Q~· 6 (-) is any approximation
to the a priori weak sense density of x(O) and which converges weakly to
that density as h and o go to zero. As noted above, these equations can
be used for the implicit method also. Note the two step (update dynamics,
incorporate the observation) division as below (7.8).
In the models dealt with in Chapters 1Q-12, such as (10.1.1), (11.1.1) and
their time varying forms in Chapter 12, neither the noise coefficient u( ·) nor
the jump coefficient q(·) depended on the control. However, the control de-
pendent forms are treated by methods which are very similar to those used
in Chapters 1Q-12, and identical convergence results are obtainable under
the natural extensions of the conditions used in those chapters. Local con-
sistency remains the primary requirement. Recall that relaxed controls were
introduced owing to the issue of closure: A bounded sequence of solutions
xn(-) under ordinary controls un(-) (for either an ordinary or a stochastic
differential equation) would not necessarily have a subsequence which con-
verged to a limit process which was a solution to the equation driven by an
ordinary control. But, if the controls (whether ordinary or relaxed) were
represented as relaxed controls, then the sequence of (solutions, controls)
was compact, so that we could extract a convergent subsequence and the
limit solution was driven by the limit relaxed control. While the introduc-
tion of relaxed controls enlarged the problem, it did not affect the infimum
of the costs or the numerical method. It was used purely for mathematical
purposes, and not as a practical control.
It will be seen, via simple examples, that analogous issues of closure or
compactness arise when the variance or the jump is controlled, even with
the use of relaxed controls. The problem now is not the relaxation of the
notion of control, but of the driving Wiener or Poisson processes, and leads
to the so-called martingale measure and relaxed Poisson measure. These
concepts are used for mathematical purposes only. They allow the desired
closure and do not affect the infima of the cost functions or the numerical
348 13. Controlled Variance and Jumps
algorithms.
To simplify the development, we will concentrate on the discounted cost
problem with no stopping time and a reflecting boundary. But extensions
to impulse control, optimal stopping or absorbing boundaries and to the
various combinations of these are all straightforward.
We will start with the variance control problem and show why the driv-
ing Wiener process concept is inadequate in its classical form for the con-
vergence and approximation. Then the extension, the so-called martingale
measure, will be developed. It will then be. seen (as in [94, Section 8]) that
all of the approximation and limit theorems of the previous chapters can
be carried over. The use of this extension of the driving process does not
actually alter the model, from the point of view of applications. Then, the
problem with controlled jumps will be treated by an analogous method,
adapted from [100, Chapter 11]. Problems with both controlled variance
and jumps can also be treated, with the obvious combinations of the meth-
ods which are used for each alone. Throughout the section, it is assumed
that (A10.1.1)-(A10.1.4) hold, unless noted otherwise, as do the assump-
tions on the set G and boundary reflection directions in Section 5. 7.
The jump component does not affect the development of the central issues
concerning variance control, and it will not be included in this section.
Define a(x,a) = u(x,a)u'(x,a) = {aij(x,a)}.
The following examples will illustrate the problems and processes which
arise when we take limits of sequences of solutions corresponding to either
ordinary or relaxed controls.
The set (xn(·), zn(· ), mn(·), ion(·)) is tight and we need only characterize the
weak sense limit process. The differential operator of the solution process
under un(-) and at X E (fJ and arbitrary t is
where z(·) is the reflection process. There are two ways of representing
the limit as a stochastic differential equation, depending on how the term
L:ia(x,ai)/2 is factored. The set (x(·),z(·),m(·)) can be represented in
terms of a single Wiener process in the sense that 1 there is a standard
Wiener process w(·), with respect to which the other processes are nonan-
ticipative and such that
x(t) = x(O) +-
2
11t0
[b(x(s),at) + b(x(s),a2)] ds
1 rt 1/2
+ ../2 Jo [a(x(s), a I)+ a(x(s), a 2)] dw(s) + z(t),
x(t) = x(O) +
1
2 Jo
t [b(x(s), at)+ b(x(s), a 2 )] ds
1 t 1 t
+ ../2 Jo a(x(s),at)dw1(s) + ../2 Jo a(x(s),a2)dw2(s) +z(t),
(t.
i
x(t) = x + 1t ~Pi(s)b(x(s),a:i)ds
+ 1 ~Pi(s)a(x(s),a:i)
t' [ ]1/2 dw(s) + z(t),
{1.4)
with any "measurable" choice of the square root process used. Again, owing
to the way that the second order part of the limit differential operator splits
into k parts, each part having a different control value, we can also represent
the solution in terms of a set of mutually independent vector-valued Wiener
processes wi(·), i = 1, ... , k, as
the limit m( ·) does not appear in a recognizable form in the variance term.
The representation {1.5), despite the use of a larger number of Wiener pro-
cesses, does not suffer from these disadvantages, although appearance of the
Pi (·) in square root form is still awkward. The limit processes x( ·) defined
by {1.4) and {1.5) are the same, despite the different representations.
The question concerns the representation of the family of possible pro-
cesses which are limits of solutions to {1.1), analogously to what was done
with relaxed controls in the previous chapters. It is clear that the repre-
sentation of the controlled process in the form {1.1) is not adequate. Keep
in mind that we are concerned with convenient representations. The key to
resolving the problem of representation lies in the fact that the differential
operator of the limit process always has the form (written in differential
notation)
To prepare for what follows, let us write {1.5) in an equivalent form. Let
{.1"t,t ~ 0} denote the filtration engendered by the processes {x(·),z(·),
m(·),wi(·),i ~ k) in {1.5). Define the (vector) measure-valued .1"t-martin-
gale process M(·) by its values on the sets A E B(U):
Tightness. Let Mn(-) (for each n) and M(·) be martingale measures (with
respect to the filtrations which they engender) and with quadratic variation
process mn(·)I (respectively, m(·)I) where the mn(-) and m{·) are relaxed
controls. The weak topology is used on the space of martingale measures.
Thus, (Mn(·), mn(·)) converges weakly to (M(·), m(·)) if and only if
defined by {1.9) is J~ fu a(s, a)m8 (da)ds. In this case, the process defined
by {1.9) is equivalent to a stochastic integral with respect to a finite number
of Wiener processes. Then, extend to f(-) still depending on only a finite
number of values of a but satisfying
{1.10)
with probability one for each t. Finally, approximate an f(·) with a general
dependence on a by a sequence of simple functions, analogously to what is
done for the real valued martingale case. (See the remarks on the proof of
Theorem 1.2.)
AI. I. For each pair (M (·), m( ·)) there is a unique weak sense solution to
{1.8).
The 1 - Qii is in the i-th place. Suppose that the spectral radius of the
matrix {lqiil} is less than unity. Consider the problem ¢i(t) = ,Pi(t)+zi(t),
where ,Pi(·) are in nr[o, oo) and zi( ·) is the reflection term. Then, [42],[100,
Section 3.4] there is constant C, not depending on the 1/Ji{ ·) such that
354 13. Controlled Variance and Jumps
Using this, the conditions on (b( ·), u( ·)) and the Picard iteration establishes
that, for any (M(·),m(·)), there is a strong sense unique solution to (1.8).
The process defined by (1.8) is the natural representation for the prob-
lem where the variance is controlled. Suppose that (x( ·), m( ·)) solves the
martingale problem with differential operator defined by (in differential
L
form)
d (em f(x)) = c~ f(x)mt(da)dt + f~(x)dz, (1.11)
where co. is the operator with control fixed at .a. Then (by augmenting
the space if necessary, by the addition of an "independent process," ([50])),
there always is a martingale measure such that (1.8) holds.
+L
i~<t
1 1u(xn(i~),
iLH~
i~ u
a)Mn(dads) + zn(t) + p~(t)
(1.12)
where, for each T < oo,
lim supEI supp~(s)l
~-tO n s~T
= 0.
By the weak convergence, (1.12) holds with the superscript n dropped. The
proof of nonanticipativeness follows the lines used in the proof of Theorem
10.1.1, and the details are omitted. Given the nonanticipativeness, let~---*
0 to get the representation (1.8).
on m(·). Theorem 1.1 implies that there is an optimal control in the class of
models (1.8). All of the results remain true if a stopping time or boundary
absorption is added, with the associated conditions from Chapters 10 or 11
used, and analogously for the finite time problems of Chapter 12.
Analogously to what was done in Sections 10.2 and 10.3, we need to
know that
and that any process of the type (1.8) and its associated cost can be arbi-
trarily well approximated by a classical controlled diffusion process and its
associated cost. This is implied by the next theorem.
(1.14)
6
M(Ui, t) = 1
0
t
6
[ms(Ui )]
1/2 6
dwi (s). {1.16)
356 13. Controlled Variance and Jumps
Remark on the Proof. The tightness follows from the results of Chapter
11. Recall the definition of wh(-) in {10.4.5). Define the measure valued
process Mh(·) by
-h
M (A, t) =
r
Jo h
I{uh(s)EA}dw (s),
Remark on the Proof. The previous theorem implies that lim infh Vh(x)
13.2 Controlled Jumps 357
ft 1t
is
x(t) = x + b(x(s), u(s))ds + u(x(s))dw(s)
+ 1£
lot o
q(x(s-),-y,u(s))N(dsd-y) +z(t).
(2.1)
Note how the control affects the jump. The jump still occurs at random,
and the controller does not know the jump times until they occur. Thus,
the jumps cannot be controlled directly, but only via the overall control
policy. The cost functions (11.1.3) will be used. But, as in the previous
section, all of the results remain true if a stopping time or boundary ab-
sorption is added, with the associated conditions from Chapters 10 or 11
used, and analogously for the finite time problems of Chapter 12. Also,
variance control can be combined with jump control.
Such controlled jumps have arisen in telecommunications, and an exam-
ple from a polling problem where some queue is occasionally unavailable to
be served is in [4, 100]. The example concerns a wireless communications
system where the sources are sending data which is created in a random
and bursty way, buffered until transmitted, and the sources can be occa-
sionally unavailable to the fixed base station antenna due to their physical
movement. The state is the total amount of work that is in the buffers of
all of the sources and the control policy is the balance of the total buffered
work between the sources. The jump is due to the increase in work if one
source becomes unavailable for a period of time which is longer than what
is needed to handle all of the work in the other available sources plus what
arrives during that interval.
The issues of convergence are similar to those that arose with variance
control in Section 1, and which led to the introduction of the (control
dependent) Martingale measure as the basic driving process. If m n ( ·) is
a sequence of admissible relaxed controls with corresponding solutions
(xn (-), zn (·)), then there might be a (weakly) convergent subsequence of
358 13. Controlled Variance and Jumps
(xn(-), zn(·), mn(·)) whose limit does not satisfy (2.1) for some Wiener pro-
cess, Poisson measure and admissible control u(·). Even in the relaxed con-
trol framework, the best way of representing the limit controlled jump term
is not a priori clear: The limit of the distributions of the jumps depends on
the limits of the set of control values at the random times of the jumps. The
derivative of the relaxed control un(t) is a measure that is concentrated at
un(t). But the derivative mtO of the limit relaxed control is defined only
almost everywhere, and is not necessarily a limit of the mf (·).
As done with both relaxed controls and martingale measures, to get the
desired closure or compactness, it is necessary to enlarge the model. This
will be done by introducing the concept of relaxed Poisson measure as a
driving process to replace the Poisson measure. The relaxed Poisson mea-
sure functions essentially as did the martingale measure of the last section.
It enlarges the problem, enabling convergence theorems to be proved, but
it does not change the infimum of the costs. The following assumption will
be used.
A2.1. For each initial condition and admissible pair (w(·), m(·)), there is a
weak sense solution to the system (2.1) without the jumps, and it is unique
in the weak sense.
p(t) = 1t k -yN(dsd-y)
The solution to (2.1) on [0, r), whether or not the jump time and value are
controlled, does not depend on the value of the jump. As a consequence,
x(t) is well defined up till the time of the first jump, and so the distribution
of the first jump is also well defined. One can proceed in this way to define
the solution for all t for (2.1). By the continuity of q(x,-y,a) in (x,a) for
each value of -y, the distribution of the jump at r is weakly continuous in
the control u(r) and state values x(r-), and u(r) can depend only on the
system data up to timer-.
Let m 6 (·) denote the relaxed control representation ofu6 (·). Then mf(ai)
= If(t). Let 6---+ 0. Then m6 (·) converges weakly tom(·) with mt(ai) = v;.
The set (over b and the jump index) of all jumps is tight as is the set of
interjump sections of x 6 (·). Fix a weakly convergent subsequence of the
interjump sections and the jumps. Then (between jumps) the limit of the
chosen subsequence can be represented as
by
1t £1/hoE(~;- 1 (t),~;(t)]}q(x(s-
1
), -y, ai)N(dsd-yd-yo). (2.7)
The representation (2.7) yields a process (interjump and jump) with the
same probability distribution as (2.4). The form of (2.7) emphasizes, again,
that the actual realization of the jump value is determined by a random-
ization via the relaxed control measure. The representations differ only in
the realization of the randomization. The presence of the discontinuous in-
dicator function in (2. 7) does not affect the existence, uniqueness or the
approximation arguments, since it does not depend on the state. This rep-
resentation in terms of a set of mutually independent Poisson measures
works only if the control takes a finite or countable number of values.
(2.8)
13.2 Controlled Jumps 361
p(t) = 1t il <P(s,,,o:)Nm(dsdrdo:),
and let f(-) be a bounded and continuous real valued function. Then the
compensator for f(p(·)) is
1t lrL
where
J(t) = q(x(s-),'"f,a)Nm(dsd-yda). (2.13)
Under the conditions in the introduction to the chapter and (A2.1), there
is a unique (weak sense) solution to (2.12) for each initial condition. The
bounds on the reflection terms given in Theorem 11.1.1 continue to hold.
For the motivating problem where the jumps were represented by either
of (2.4) or (2.7), the Nm6(dsd7da) defined above (2.8) (when setting m =
m 6 ) equals that define~ here. Furthermore, it follows from Theorem 2.1 that
Nm60 converges weakly to a relaxed Poisson measure Nm(·) associated
with the limit relaxed control m( ·). Also,
converges weakly to
and (x 6 (-), z6 (·), m 6 (·),Nm6(·)) converges weakly to (x(·), z(·), m(·), Nm(·)).
In general, ifthere are a finite number of points {ai, i :::; k} on which mt (-)
is concentrated for almost all w, t, then the jump processes can be rep-
resented in terms of k mutually independent and identically distributed
Poisson measures.
probability of more than one jump is o( o). It also tells us that the jump
distribution is II(·) and that the jump value (given a jump) is independent
of the time of the jump. If x(·) is an .1"t-adapted process with paths in
Dr[O,oo), then the process defined by
{t { { q(x(s-),"'f,a)Nm(dsd"'(da)
lo lr lu
-A 1[fu
t
q(x(s-),"'f,a)IT(d"'()m 8 (da)ds
(2.14)
Next, let {J?, t ~ 0}, wn(-), mn(·), Nm.. (·)} be a sequence of filtrations,
standard J?- Wiener processes, admissible controls and relaxed J?- Pois-
son measures. We have the following limit theorem.
Theorem 2.1. Under the conditions in the introduction to the chapter and
(A2.1), the set
is tight. The limit of any weakly convergent subsequence satisfies (2.12) and
(2.13). Let {Ft, t ~ 0} denote the filtrotion induced by the limit. Then w(·)
is a standard .1"t- Wiener process, m(·) is admissible and NmO is a relaxed
Poisson measure with compensator process defined by (2.9).
converges weakly to J(·). Now, piece the interjump limits and jump limits
together to get (2.12) and (2.13).
364 13. Controlled Variance and Jumps
Theorem 2.2. Assume one of the cost functions in {11.1.3), the conditions
in the introduction to the chapter and {A2.1). Then there is an optimal
control of the relaxed problem.
Theorem 2.3. Assume the condition in the introduction to the chapter and
(A2.1). Then the set (x 6 (·), z 6 (·),m 6 (·),w 6 (·),Nm6(·)) converyes weakly to
(x(·),z(·),m(·),w(·),Nm(·)) and W,a(x,m 6 )-+ W,a(x,m).
(w( ·), N (·), m( ·)) be a Wiener process, Poisson measure and relaxed con-
trol, respectively, with respect to some filtmtion. Define the piecewise con-
stant control uA(·) as follows. For l ~ 1, define TiA(l) = m(l~,ai)
m(l.6. - ~. ai) and divide each interval [l~, l~ + ~) into subintervals of
lengths rt{l), ... , rf'(l). Then use the control value ai, i :::; k, on the subin-
tervals successively. Let mA(-) denote the relaxed control representation
of uA(·). Let Nma(·) denote the associated relaxed Poisson measure, and
(xA(·), zA(·)) the corresponding solution to (2.12) and (2.13). Then
(xA(·), zA(-), wA(·), mA(-), Nma (·))
converges weakly to (x(·),z(·),w(·),m(·),Nm(·)), solving (2.12) and (2.13).
Infima over Ordinary Controls. Theorems 2.3 and 2.4 imply that the
infimum of the costs over the ordinary admissible controls equals the infi-
mum over the relaxed controls. Thus, the extension of the model via the
introduction of the relaxed Poisson measure does not affect the infimum of
the cost function.
(2.17)
for a standard Poisson measure with jump rate A and jump distribution
IT(·).
where uh(·) is the control process, th(-) converges weakly to the "zero"
process and zh (·) is the reflection term. The jump term can be represented
Jh(t) = ft f 1
lo lr" u
Qh('I/Jh(s),'"f,uh(s))Nh(dsd'"f), (2.19)
1
(2.20)
Jh(t) = ft f Qh('I/Jh(s),'"f,a)N!,.(dsd'"fda). (2.21)
lo lr" u
Henceforth, we assume the conditions in the introduction to the chapter,
(A2.1) and (A2.2). From this point on, the proof that (1/Jh(·),zh(·),mh(·),
wh(·),N!,.(·)) converges weakly to (x(·),z(·), m(·),w(·),Nm(·)) is essen-
tially that of Theorem 2.1. The weak convergence implies that
in Section 14.2.
In many problems we must also consider costs that are discontinuous.
If the discontinuity is in the stopping or exit cost, then convergence can
be proved under a mild "controllability" condition. Discontinuities of this
sort are considered in Section 14.2. The problem becomes much more dif-
ficult if the discontinuity is in the running cost. In the case of stochastic
control problems, such discontinuities may essentially be ignored (with re-
spect to the convergence of the numerical schemes) if the optimally con-
trolled process induces a measure on path space under which the total cost
is continuous w.p.1 (see the remark after Theorem 10.1.1). When the un-
derlying processes are deterministic the situation is quite different. In this
case, properly dealing with the discontinuity becomes the main focus of in-
terest. Although the dynamics are deterministic, the natural formulation of
the cost along the discontinuity involves a "randomization" of the costs on
either side, and so one might expect probabilistic methods to be effective.
We will see that the Markov chain method continues to work well and, in
fact, provides a very natural way to deal with a difficult problem for which
there are no alternative methods.
Many of the complicating features we will discuss can also occur in stan-
dard stochastic and deterministic control problems. The interested reader
can combine the methods used in this chapter and the next with those
introduced previously to treat some of these generalizations.
An outline of the chapter is as follows. In Section 14.1 we consider a fixed
time interval and suppose that the cost is the sum of a continuous running
cost and a terminal cost. This is the calculus of variations analogue of the
control problem treated in Chapter 12, and is known as a Bolza problem.
In Section 14.2 we describe the numerical schemes and give the proof of
convergence. Problems where the running cost is discontinuous in the state
variable are considered Section 14.3. Problems with a controlled stopping
time are the topic of Chapter 15.
This condition is natural and holds in most applications. Under (1.1) there
exists a convex function l : [0, oo) --+ (-oo, +oo] which is bounded from
below and satisfies
k(x, o:) ~ l(jo:l) for all (x, o:), and lim l(c)fc--+ oo. (1.2)
c-too
14.1 Problems with a Continuous Running Cost 369
where the infimum is over all absolutely continuous functions ¢ : [0, T] --+
JRk satisfying ¢(0) = x. For notational convenience we rewrite this prob-
lem as an optimal control problem. The set of admissible controls for this
problem will consist of all measurable functions from [0, T] to JRk. Let u( ·)
be any admissible control. The dynamics of the controlled process are then
given simply by x(t) = u(t),x(O) = x, and the cost to be minimized is
where the infimum is over all absolutely continuous functions ¢ : [t, T] --+
JRk satisfying ¢(t) = x. This is the calculus of variations analogue of the
problems treated in Chapter 12. To simplify the notation, we consider only
V(x), but note that the approximation schemes that are derived below
actually yield approximations to V(x, t) for all t E [0, T].
Just as in Chapter 3 it is possible to formally derive a Bellman equation
for the cost V(x, t), at least for the case where k(·, ·)is continuous. For the
problem under consideration, the differential operator takes the particularly
simple form
CY. f(x) = f~(x)o:,
where the control variable o: takes values in JRk. The formal Bellman equa-
tion is then given by
Vt(x, t) + inf [.CaV(x, t) + k(x, o:)] = 0,
{ aER.k
V(x, T) = g(x).
It is sometimes the case that the functions k( ·) and g( ·) satisfy continuity
conditions. However, there are many interesting applications where this is
370 14. Problems from the Calculus of Variations: Finite Time Horizon
not true. For example, it may be the case that the path ¢( ·) is required to
stay in some closed set G over the interval [0, T]. This can be incorporated
into the problem by defining k(x, a) to be +oo for x ¢G. It is also possible
that constraints are placed on ¢0 or on the location of ¢(T). These can
also be incorporated into the problem as given above by suitably redefining
k(·) and g(·). In general, these control and state space constraints can be
readily dealt with (when constructing the numerical method and proving
its convergence). See, for example, the convergence theorems of Sections
14.2 and Chapter 15. A case with a discontinuity that is more difficult to
deal with appears in Section 14.3.
For numerical purposes we may require that the state space be bounded.
One method for bounding that produces an algorithm which is simple to
program is to simply stop ¢( ·) at the first time r that it leaves the interior
of a suitable set, such as
at which time a stopping cost g(¢(r), r) may be assessed. This would add a
Dirichlet boundary condition to the Bellman equation above. For notational
convenience we will combine the terminal and stopping costs as one function
g(x, t), and for later purposes we will assume that g(·, ·) is defined on all of
JRk x [0, T]. The calculus of variations problem then becomes
Alternatively, one can use the analogue of the "reflecting" boundary con-
dition for diffusions that was introduced in Section 1.4.
Fix a particular x E G. For either of these methods of bounding the
state space, a condition that is sufficient to guarantee that the value for
the original problem (1.3) and the value of the modified problem {1.4) are
the same is that the minimizing trajectories for the two problems be the
same and remain in G0 • It is often the case that properties of the functions
k(·, ·)[e.g. the lower bound l(·)] and g(·) can be exploited to obtain a bound
on the range
R = {¢(t): 0 ~ t ~ T},
where ¢( ·) starts at x at time 0 and is a minimizing trajectory for the
problem with no boundary. In such a case, G0 should be chosen to contain
R. By imposing a suitably large stopping cost on the set {(x, t) : x E
8G, t E [0, T)}, it can be assured that the minimizing trajectories for the
original and bounded problems are the same and remain in G0 •
Owing to its practical importance with regard to numerical implemen-
tation, we will use the problem (1.4) as our canonical example of a finite
time problem. In particular, the convergence of numerical schemes will be
demonstrated for this problem. It is worth noting that under appropriate
14.2 Numerical Schemes and Convergence 371
uniformly for a E JRk, and similarly for (2.2). All chains considered in this
chapter and the next are assumed to be locally consistent in this sense. We
will also assume throughout the rest of this chapter that ~th (a) -t 0 for
each a E JRk and that
and ~th(a) = h('L,j lajl)- 1 . This definition actually makes sense only when
a -! 0, and we take care of the omitted case by setting
ify =X
otherwise.
372 14. Problems from the Calculus of Variations: Finite Time Horizon
The sequence ~th(o) > 0 is arbitrary [for the purposes of satisfying (2.1)
and (2.2)] and for simplicity we take ~th(O) = h. •
We will require for the finite time problem that ~th(a) be independent
of the control. Following the development of Chapter 12, we next derive
explicit and implicit schemes from transition probabilities satisfying the
local consistency conditions (2.1) and (2.2).
uh,<5 ={a: t
J=l
Ia; I~ h/8}.
It may be that bounds on the optimal control for the original calculus
of variations problem are available. For example, it may be that one can
calculate B such that
ess sup
tE(O,T]
L l¢;l(t) ~ B,
j
where ess sup stands for essential supremum and ¢( ·) is a minimizing path.
Then we must assume conditions which guarantee that the restricted con-
trol space Uh,<5 is eventually "big enough." For example, in the case of
Example 2.1 we must assume that the pair (h, 8) are sent to zero in such a
way that
liminf h/8 ~B.
h,6
If such a bound is not available, we must assume
lim inf Uh,.S = IRk (2.6)
h,6 '
14.2 Numerical Schemes and Convergence 373
i.e., h and d must be sent to zero in such a way that given any u E JRk we
o
have u E Uh• 6 for all sufficiently small h > 0 and > 0.
Now let Gh = GnSh and G~ = COnSh. Then the explicit approximation
scheme for solving (1.4) is
v"·' (x, nO) ~ .~P.• [~>··· (x, ylo )V"·' (y, nO+ 0) + k( >•]
X, Q (2. 7)
for x E G~ and no < T, together with the boundary and terminal condition
Vh· 6 (x,no) = g(x,nd) for X¢ G~ and no~ Tor X E ~and no= T.
(2.9)
(2.10)
where ph(x,yia) and ath(a) satisfy the local consistency conditions (2.1)
and (2.2). We also retain from Chapter 12 the notation (~· 6 for the con-
(:•g)
trolled Markov chain and ((~· 6 , to denote separately the "spatial" and
''temporal" components. '
Define Nh· 6 (T) = min{n : (::g~ T}. Then the implicit scheme for
solving (1.4) is
A2.3. The function g(·, ·) is uniformly continuous and bounded when re-
stricted to either of the sets (JRk- G0 ) X [0, TJ and G0 X {T}.
Remarks. The condition (A2.2) occurs often in calculus of variations prob-
lems of the type we consider. See, for example, [59, Chapter 5]. The con-
tinuity assumption in (A2.2) is intermediate between continuity in x that
is uniform in a (which is much too restrictive) and simple continuity. Note
that (A2.3) does not assume continuity of g(·, ·). Continuity would not
be reasonable, since g models both the stopping cost and the terminal
cost. We have chosen the given form for g(·, ·) because it is common in
applications. Owing to the "controllability" of the dynamics ¢(t) = u(t),
discontinuities in g(·, ·) are generally not too difficult to deal with. In par-
ticular, it follows from (A2.1) and (A2.2) that the infimum in (1.4) is the
same if g(·,·) is replaced by g*(·,·), where g*(x,t) = g(x,t) fort< T and
g*(x, T) =limE-to inf{g(y, T) : ly-xl ~ t:}. We assume without loss of gen-
erality (be redefining k off G if need be), that k(x, 0) is uniformly bounded
for x E JRk.
It is sometimes desirable to weaken (A2.1). For a simple example, con-
sider the case of control until a target set is reached when the target set is
a single point. For obvious reasons, such target sets are not typically con-
sidered in stochastic control problems. However, they do appear often in
deterministic control. Clearly, the interior cone condition is satisfied at the
point, but the exterior cone condition is not. In the example of Subsection
15.3.3 we show how to extend the convergence proofs to cover such cases.
Definition. The set of admissible relaxed controls for this class of deter-
ministic problems consists of all measures m( ·) on the Borel subsets of
JRk x [0, oo) which satisfy m(JRk x [0, t]) = t for all t E [0, oo ).
Let f > 0 be given. Then there exists i5 > 0 and a finite set {a}, ... , a~.} =
U, c JRk with the following properties. There is a function u' : [0, T] --+ U,
which is constant on intervals of the form [ii5, ii5 + 15), and such that if
x'(t) = x + 1t u'(s)ds,
then
sup lx'(t) - x(t)l :Sf
O$;t$;T
and
{T
Jo
r
}JR.
I{lal~c(t)}lk(x(s),a)lm(dads):::; f/2 (2.12)
and
{T { l{lal~c(t)}lalm(dads) :St.
Jo JJR•
This last inequality implies
sup I rt
o::;;t::;;r Jo JrJR• I{laJ~c(t)}am(dads)l:::; f.
(2.13)
( sup lk(x,O)I) (
xEIR•
fr f
Jo JJRk
I{laJ~c(•)}m(dads)) ::; f/2. (2.14)
376 14. Problems from the Calculus of Variations: Finite Time Horizon
sup I ft f
o<t<T lo }Bk
k(x€(s),a)m~(da)ds- ft f k(x(s),a)m 8 (da)dsl
lo }Bk
-- ,; <+ /(<) ( MT + { /.,, k(x(s), a)m,(da)ds) .
•
As was observed in Chapters 9 and 10, for problems with boundary con-
ditions one must pay particular attention to the manner in which controlled
trajectories reach the boundary. A previously discussed method for proving
the convergence of costs involved showing the continuity of the first exit
time of ¢ when ¢ was a sample path of an underlying process (at least
w.p.1). See the discussion in Section 10.2. Here, the deterministic nature of
the problem makes this approach not applicable. Instead of requiring that
the exit time be continuous for all paths x that solve x(t) = x+ I~ u(s)ds for
arbitrary admissible controls (which is impossible), we shall require only the
existence of an «:-optimal control such that t"he exit time is continuous at
the associated solution. This condition will always hold under our assump-
tions [except when we weaken (A2.1) in Subsection 15.3.3] and turns out
to be sufficient for the proof of the upper bound limsuph-tO Vh(x) :$; V(x).
To handle the lower bound liminfh-tO Vh(x) 2:: V(x), we will need the
following result.
Theorem 2.3. Assume the conditions (A2.1) and (A2.2). Let m(·) be any
admissible relaxed control, x E G0 , and let x(t) = x +I~ IBk am(dads).
Fix any (J' < oo, and assume that x(t) E G fortE [0, (]'], and that
Then given f > 0 there exists a control m€0 with associated solution xE(·)
such that
x€(t) E G0 , fort E [0, (]'), xE(O) = x(O), xE((J') = x((J'),
and
14.2 Numerical Schemes and Convergence 377
Proof. The basic idea of the proof is to show that if the path x( ·) is on oG
at any timet E (O,u), then the interior cone condition and the continuity
properties of the running cost imply that by "pushing" the path by a small
amount in the v(x(t)) direction, we obtain a path that is interior toG at
timet and that has nearly the same running cost as x(·).
We now proceed to the proof. Assume the theorem is false. Then there
exists a control m( ·) with associated solution x( ·) for which the conclusion
is not true. Define A to be the set of all t E [0, u) such that given any
f > 0 there exists a control m*O with associated solution x*(-) with the
properties that x*(s) E G 0 for all s E [0, t), x*(O) = x(O), x*(s) = x(s) and
m;(-) = msO for s E [t, u], and
for all t E [u* - v*, u*) and 'Y E (0, v*]. For B c JRk and x E JRk, let
B + x = {y + x : y E B}. For each c > 0 we define an admissible relaxed
control me(-) by
Let xc(-) be the associated solution. The effect of this control is to force
xc(t) E G 0 fortE [u* -v*,u* + v), for all sufficiently small c > 0. This
378 14. Problems from the Calculus of Variations: Finite Time Horizon
follows from (2.16) and (2.17). We also have xc(t) = x(t) and m~(-) = mt(-)
fort E [u* + v, u], and xc(t) = x*(t) fortE [0, u*- v*). Under (A2.2), we
have
f
lo
u*+v
J.
fik
k(xc(s), a)mc(dads)- f
lo
u*+v
f.
fik
k(x*(s), a)m*(dads) --+ 0
for each S < oo. Then the set {mh(-), h > 0} is tight. Assume also that
es'
the collection of initial conditions { h > 0} is tight, and suppose that a
subsequence (again indexed by h) is given such that {m h ( ·), h > 0} con-
verges weakly to a limit m(.) and e8 converges weakly to X. Then m(.) is
an admissible relaxed control and {eh(·), h > 0} converges weakly to a limit
x(·) that satisfies
x(t)- x = ft
lo
1 o:m(do:ds)
JRk
w.p.l.
Suppose for each h > 0 that N h is a stopping time for the chain { ef, i <
oo }, and let Th = t'Jvh. If
lim c(h)Em
h--+0
h1Th1
0 JRk
!o:!mh(do:ds) = 0,
in probability.
Proof. Fix S < oo. By using the lower bound (1.2) and the assumptions
of the theorem, we obtain
limsupEmh
h--+0
[1 Jlk X (0,8)
l(!o:l)mh(do:ds)] < oo, {2.18)
limsuplimsupEmh
b--+oo h--+0
[1Jlk x (0,8)
I{lal~b}!o:!mh{do:ds)] = 0. {2.19)
and the interpolation eh (t) = e~ for t E [t~, t~+l). The process eh ( ·) keeps
track of the "error" terms in (2.1). Owing to the definition of this process,
Therefore, if we define
then the process {'Yh(t~), i < oo} is a martingale. A calculation using (2.2)
gives
Emh l'Yh(Sh)l2 ~ c(h)Emh {sh { lo:lmh(do:ds),
lo JRk
where c(h).....; 0 ash.....; 0. Thus, under the assumptions of the theorem,
sup
O~t~S
~~h(t)- ~h(O)-
lo
t j.H'lk
{ o:mh(do:ds)- eh(t)J.....; 0.
Using the Skorokhod representation we can assume that the convergence
mh(-).....; m(·) is w.p.1 in the topology of weak convergence of measures on
JRk x [0, oo). By (2.19) and the fact that m(JRk x {t}) = 0 for all t E [0, S],
we obtain
rt }Rkr
lo
o:mh(do:ds).....;
lo
rt }Rkr
o:m(do:ds)
w.p.l. It follows from (2.1) that SUPo<t<S leh(t)l .....; 0. Combining these
facts with the representation (2.20), we have that (eh(·), mh( ·)) converges
in probability to (x(·),m(·)), where
x(t) =X+
lo
t r o:m (do:)ds
JRk
8
Theorem 2.5. Assume (A2.1), (A2.2), (A2.3), and that (2.6) holds. Then
for the explicit scheme defined by (2. 7) we have
Vh· 6 (x) ---* V(x).
e:·
Proof. Let { 6 , i < oo} be a controlled locally consistent Markov chain
as in Subsection 14.2.1 with the transition probabilities _ph· 6(x,y!a), inter-
polation interval a, and initial condition x E CO. Let {u~·'\ i < oo} denote
a sequence of controls that are applied to the chain. We define the inter-
polated process and control by setting
eh•6 (t) = e~· 6 , uh• 6 (t) = u~· 6 , t E [ia,ia +a).
Define 'Th,.s to be the first time eh,.s (·) leaves ag. As usual, we use relaxed
rather than ordinary controls to prove the convergence of the scheme. We
define mh•6 (·) by setting
Proof of the Lower Bound. For any e > 0, let {mh• 6 (-),h > O,a > 0}
be the relaxed control representation of an e-optimal sequence of ordinary
admissible controls. In proving the lower bound, we can assume without
loss that the running costs associated to this sequence are bounded from
above, and therefore that Theorem 2.4 applies. Let (h, a) denote any sub-
sequence, and retain (h, a) to denote a further subsequence along which
(eh• 6 (-), mh·6 (-), Th,.s) converges weakly. We have the inequality
h 6 1TATh,61
Vh• 6 (x) ~ E';' ' k(eh· 6 (s), a)mh•6 (dads)
0 JRk
[1 1
T ::::; f. We claim that
TATh,6
liminf k(eh• 6 (s),a)mh• 6 (dads)
(h,.S)-+0 0 JRk
(2.21)
+ 9 (eh· 6 (T" Th,.s), T" Th,.s)] ~ V(x).
382 14. Problems from the Calculus of Variations: Finite Time Horizon
The case f < T, r = f. In this case (2.21) follows from the conver-
gence (eh· 6(-),mh• 6(-),rh,6) -t (x(·),m(·),f), Fatou's lemma, and (A2.2)
and (A2.3).
The case f < T, r < f. For this case we still have the left hand side of
(2.21) bounded below by
The cases f > T, r = f and f > T, r < f are handled in the same way
as f < T, r = f and f < T, r < f, respectively. Finally, we must deal
with f = T. The only difference between this case and the previous cases is
that g(·, ·)may be discontinuous at (x(T), T). Consider first the case when
r = f, i.e., x(t) E C0 for t E [0, T). In this case, (2.21) follows from the
fact that
Proof of the Upper Bound. FixE > 0 and let m(·) be an €-optimal
admissible relaxed control for V(x). Let x(·) denote the associated solution.
We first show we can assume [by modifying m( ·) if need be] that x( ·) either
exits C 0 in a "regular" way or else remains in C 0 •
Case 2. Suppose that T = inf{t : x(t) E 8G} < T. Assumptions {A2.1) and
{A2.2) imply that we may redefine m 8 {-) for sin some interval of the form
s E (r,v],v >Tin such a way that x(t) r/. G fortE {r,v) and
11r
7'
t
JRk
k(x(s), o:)ms(do:)ds ~0
as t ~ T.
Case 3. Suppose that x(t) E CO fortE [0, T) and x(T) E oG. Then there
are two possibilities. If limHo inf{g(y, T) : ix(T) - Yi ~ €, y E G0 } ~
g(x(T), T), then by the same argument as in Theorem 2.3 there exists a
control mE(·) which is 2€-optimal and for which the associated solution
remains in G0 for all t E [0, T]. Otherwise, we can use the exterior cone
condition of (A2.1) and (A2.2) and find a control mE(·) which is 2E-optimal
and for which the associated solution exits CO before timeT. Hence, case
3 reduces to either case 1 or case 2.
The main point of all this is that we can always find an E-optimal control
that avoids the problematic case where x(·) exits G0 at time T. More
precisely, we may assume that given any E > 0 there exists an E-optimal
control m(·) with associated solution x(-) such that either
The case T > T. Let any E1 > 0 be given. By Theorem 2.2 there is a finite
set UE 1 C JRk, 6 > 0, and an ordinary control uE 1( ·) with the following
properties. uE 1 ( ·) takes values in UE 1 , is constant on intervals of the form
[j6, j6 + 6), and if xE 1 ( ·) is the associated solution, then
sup lxE 1 (t)- x(t)i ~ E1
O:S;t~T
and
sup
O:S;t~T
Itlo r
JJRk
k(x(s),o:)ms(do:)ds- t k(xE (s),uE (s))dsl ~fl.
lo
1 1
Under (2.6) we have Uq C Uh,l; whenever h > 0 and 8 > 0 are sufficiently
e;·
small. We apply the control uE 1( ·) to the chain { 6 , i < co} in the obvious
way, namely,
384 14. Problems from the Calculus of Variations: Finite Time Horizon
The case r < T. Next consider the case of (2.23). As in the case of (2.22) we
can assume the existence of piecewise constant ordinary control with the
given properties save that T is replaced by v. Because SUPo<t<v lx<t (t) -
x(t)i can be made arbitrarily small, (2.23) implies we can also assume that
We now apply u' 1 ( ·) as in the previous case. By sending (h, 8) -t 0 and thEm
fi -t 0, we obtain limsup(h,l5)~o Vh· 15 (x) ::; V(x) + f from (A2.2), (A2.3),
(2.23), and (2.24). Since f > 0 is arbitrary, the upper bound is proved. •
Theorem 2.6. Assume (A2.1), (A2.2), and (A2.3). Then for the implicit
scheme defined by (2.11) we have
Recall that r is the first time that ¢( ·) exits the interior of a given set G
and that T > 0 is fixed.
In many calculus of variations problems, there is some underlying dynam-
ical model for a "physical" system which determines the function k( ·, ·), and
minimizing (or nearly minimizing) paths ¢(-) have important physical in-
terpretations. For example, in the case of classical Hamiltonian mechanics,
the relevant laws of motion define k(·, ·), and the path ¢0 describes the
trajectory followed by a particle subject to those laws. In the case of ge-
ometric optics, k(·, ·) is defined in terms of the local speed of light. Thus,
in the typical formulation of a calculus of variations problem appropriate
14.3 Problems with a Discontinuous Running Cost 385
to some given applied problem, the function k(·, ·) is obtained during the
modelling stage.
Clearly, the function k(·, ·) reflects properties of the "medium" in which
the underlying dynamical system evolves. Typically, the definition is local
in the sense that k(x, ·)reflects the properties of the medium at x. If k(x, a)
possesses some kind of continuity in x, then it seems reasonable to believe
this reflects a type of spatial continuity in the properties of the medium;
e.g., the speed of propagation varies continuously in x in the geometri-
cal optics problem. However, in many problems of interest such continuity
properties may be violated. For example, it may be the case that the space
is divided into two (or more) disjoint regions R< 1> and R< 2>, with a smooth
interface separating the regions. In each region the physical properties of
the media vary continuously, but they differ from one region to the other.
It is simple to produce examples of this type from classical mechanics or
geometrical optics. More recent examples come from large deviation the-
ory [40] (and, in particular, the application of large deviation theory to
queueing systems [44]).
In such a case, one must rethink the modelling of the original physical
problem as a calculus of variations problem. Clearly, the modelling ap-
propriate for a single region of continuous behavior should be appropriate
for defining or identifying k(x, ·) when x is in the relative interior of ei-
ther of the regions R< 1> or R< 2>. However, there is still the question of the
proper definition of k(x, ·) for points x on the interface. This is quite im-
portant because in many cases the optimal paths will spend some time on
the interface. The mathematical problem (3.1) will be well posed under
just appropriate measurability assumptions on k(·, ·). But from the point
of view of modelling, certain additional properties can be expected (or per-
haps should even be demanded). For example, regardless of how k(x, ·) is
defined on the interface it should lead to a cost function V(x) which has
desirable stability properties under approximations, discretizations, small
perturbations, etc. This turns out to impose restrictions on the form of
k(x, ·).
In this section we will present what appears to be a "natural" definition
of the integrand on the interface, and describe an associated numerical
procedure. By natural, what is meant is that the definition occurs in ap-
plications, leads to a value that is stable under discretizations, and can be
shown to be the only definition on the boundary that is stable under a wide
class of approximations. We will also show that this particular definition of
the cost on the interface, in spite of its complicated appearance, allows the
use of relatively simple numerical approximation procedures. To simplify
the notation we will assume that the interface is "flat," in which case we
can take the interface to be {x E JRk : x1 = 0}, R(l) = {x : X1 ~ 0}, and
R< 2> = {x: x 1 > 0}, where the subscript denotes the first component of x.
Generalizing the results contained in this section to cover a smooth curved
boundary is not difficult. The case of several intersecting boundaries, which
386 14. Problems from the Calculus of Variations: Finite Time Horizon
k(O) (x, a) = inf {p< 1l k( 1) (x, a<l)) + p< 2 ) k( 2 ) (x, a< 2l)} , (3.2)
where the infimum is over (p< 1l, p< 2l) E JR 2 and (a(l), a< 2l) E JR 2k satisfying
(3.3)
time process will actually move away from the interface, contradicting our
assumption that it remain near the interlace fori E {i 11 i 1 + 1, ... , i 2 }.
v•·•(x, nO) ~ .~~.• [ ~ fi"·' (x, Yl<>)V•·•(y, nO+ 0) + k(x, a)0] (3.8)
for x E ~and n8 < T, together with the boundary and terminal condition
yh,.S(x, no)= g(x, no) for X tJ. ag
and no< Tor X E ag
and no= T. Recall
the Uh,.S was defined in {2.5).
Because our main interest in this section is in examining the new features
that are associated with the discontinuous running cost, we will replace
{A2.1) and {A2.3) by the following somewhat stronger conditions.
A3.1. The set G is compact and satisfies interior and exterior cone con-
ditions: There exist e > 0 and continuous functions v(·) and w(·) such that
14.3 Problems with a Discontinuous Running Cost 389
Theorem 3.1. Assume {A3.1), {A3.2), and that both k< 1>(·, ·) and k< 2>(·, ·)
satisfy {1.1) and {A2.2). Let k(·, ·) be defined by {3.2)-{3.6), assume that
{2.6) holds, and let V(x) be defined by (3.1). Then for the explicit scheme
defined by {3.8), we have
Proof. We begin by recalling the notation used in the proof of Theorem 2.5.
e?·
Thus, { 6 , i < oo} is a controlled Markov chain as described in Subsection
14.2.1 with transition probabilities f}• 6 (x, yJo:), interpolation interval6, and
initial condition x E G0 • If { u~· 6 , i < oo} is a sequence of controls applied
to the chain, then we define the interpolated process and control by setting
Define Th,{j to be the first time eh·6 (·) leaves~. and let mh· 6 (-) denote the
relaxed control representation of uh· 6 ( ·).
Before proving the lower bound, we must state the following lemma. The
lemma is analogous to and is used in the same way as Theorem 2.3.
Lemma 3.2. Assume the conditions of Theorem 3.1, and let cf>(·) be an
absolutely continuous function that satisfies cf>{O) E G0 , cf>(t) E G fort E
[O,a], and
1u k(cf>(s), ~(s))ds < oo.
Then given E > 0, there exists an absolutely continuous function cf>E(·) such
that cf>E(t) E G0 fort E [0, a), cf>E(O) = cf>{O), cf>E(a) = cf>(a), and
The proof is essentially the same as that of Theorem 2.3 and is therefore
omitted. We will only note that under {A3.1) we can assume the existence of
"( > 0 and v*{-) such that X E oG and J(x)IJ ~"(imply (v*(x))t = 0, where
v*(-) satisfies the conditions on v(·) given in the statement of the interior
390 14. Problems from the Calculus of Variations: Finite Time Horizon
Note that since mh· 6 (-) = 11( 1),h, 6 (·) + 11( 2),h,6 (·), the tightness of {mh· 6 (·),
h > 0,8 > 0} implies the tightness of {ll(i),h, 6 (·),h > 0,8 > 0} fori=
1, 2. The measures 11( 1),h,6 (-) and 11( 2),h, 6 (·) record the control effort that
is applied, when it is applied, and also distinguish between when the state
eh·6(s) is in {x: (x)t ~ 0} and {x: (xh > 0}.
We now apply the Skorokhod representation and extract a weakly con-
verging subsequence from
with limit
( x(·), m(-), 11(1) (-), 11< 2>(·),f).
Equation (3.9) follows easily from the definitions of the v(i),h,.S(·) and the
weak convergence, whereas (3.10) and (3.12) follow from the relationship
mh· 8 (-) = v(l),h,<l(·) + v( 2),h,<5(-). The only property that is not obvious is
(3.11). We will first prove the lower bound assuming (3.11) and then show
(3.11).
Now fix an w for which there is convergence via the Skorokhod represen-
tation. We have
~ r r k(l)(x(s),a)V(l)(dads)
T/\-
lo J
T
JRk
+ r r k( )(x(s),a)v( )(dads)
T/\-
T 2 2
lo lmk
= 1 Lk [k( 1 )(x(s),a)v~ 1 )(da)+k(2>(x(s),a)v~ 2)(da)]
T/\-
r ds.
The set {s: (x(s))l = 0, (x(s)h =f. 0} is a set of measure zero. Therefore,
the definition of k(·, ·),the convexity of the k(i)(x, ·),and the properties of
the v;i)(·) given in (3.9)-(3.12) imply
~1
(h,<l)--+0 0 JRk
TAr
k(x(s),x(s))ds+g(x(T/\f),T/\1').
392 14. Problems from the Calculus of Variations: Finite Time Horizon
As always, there is the difficulty due to the fact that r = inf{t: x(t) E 8G}
might be smaller than f. Using Lemma 3.2 in precisely the same way that
Theorem 2.3 was used in the proof of Theorem 2.5, we conclude that
1 TM
k(x(s), x(s) )ds + g(x(T 1\ f), T 1\ f) 2: V(x).
for the convergent subsequence. The proof of the lower bound is completed
by using an argument by contradiction.
(x(·),m(·),v< 1)(·),v<2)(-)).
f71(z) = { FJ(z) if z f 0
-1 if z = 0,
and assume that J71(z) ~ 0 for z f/. [-"{, "f] as 1J ~ 0. We can write
T/6-1
4"( 2: L F71 ((~;:1h)- F71 ((~7' 6 h)
j=O
T/6-1
= L c5J71((~7'6h) [<u~'6h] + e~,6 + e~·6,
j=O
14.3 Problems with a Discontinuous Running Cost 393
where
T/8-1
e~· 8 = L r'((e;· h) [(e;:1h - (e;· h -t5(u7' h]
8 8 8
j=O
and
T/8-1
e;· 8 = L [F'~((e;tlh)- F'~((e7' 8 h)] -r'((e7·8 h) [(e;tlh- (e7' 8 h].
j=O
Owing to local consistency, e~· 8 --+ 0 in probability. We now use the fact
that F'~ approximates the absolute value function near the origin to deal
with the term e;•
8 • If Z1 and Z2 are smaller than "f, then
T/8-1
e;· 8 = L [r((e;· h)- r'((e;· )d] [(e;tlh- (e;· h] I{l<~;·~hl~-r/4}
j=O
8 8 8
and (C' h
8 is a point between 8 (e;· h
and (e;tlh·
Since eh·8(-) converges
uniformly to a process with continuous sample paths,
in probability. Since
[ (ahlro,-yJ((x(s))l)11( 2)(dads)
jJRkx[O,T]
- [ (a)ll[--y,OJ((x(s)}l)ll(l)(dads) ~ 4'Y.
jJRkx(O,T]
if izl ~ 'Y
F(z) = { ~ if z > 'Y
-'Y ifz<-'Y
shows that
[ (ahlro,-yj((x(s))l)v( 2)(dads)
jJRkx[O,T]
+ [ (ahl[--y,OJ((x(s)}l)ll(l)(dads) ~ 4'Y.
JJRkx[O,T]
1 JRkx[O,T]
(a)ll[--y,OJ((x(s))l)ll(l)(dads) > -8'Y,
1 JRkx[O,T]
(ahlro,-yj((x(s))I)11( 2)(dads) < 8'Y.
1
JRkx[O,T]
(ahl{o}((x(s))I)II(l)(dads) > 0,
1
JRkx(O,T]
(ahl{o}((x(s))1)11( 2 )(dads) < 0
(3.13)
w.p.l.
The argument that led to (3.13) can be repeated with s restricted to any
interval [a, b] c [0, T] with the same conclusion. Thus we can assume it
holds simultaneously for all such intervals with rational endpoints. Using
the definitions of 11~ 1 ) ( ·) and 11~ 2 ) ( •), this implies
14.3 Problems with a Discontinuous Running Cost 395
whenever (x(s)h = 0 a.s. ins and with probability one. This proves (3.11).
Before proving the upper bound we present the following lemma. The
lemma implies the existence of an €-optimal piecewise linear path for the
calculus of variations problem and is analogous to Theorem 2.2. The proof
of the lemma is given at the end of the section.
Lemma 3.3. Assume the conditions of Theorem 3.1. Then given any € > 0
and any absolutely continuous path¢: (0, T] --+ JRk satisfying ¢(0) = x and
rT .
Jo k(¢(s),¢(s))ds < oo, there exist N < oo, 0 =to< tt < · · · < tN = T,
and a function uE : (0, T] --+ JRk which is constant on the intervals [tn, tn+l),
n < N, such that if
then
and
sup [
o~t~T lo
t k(¢E(s),uE(s))ds- lot k(¢(s),¢(s))ds] : : :; €.
Furthermore, we can assume that for any n < N that either (¢E(t)h =1- 0
for all t E (tn, tn+l) or (¢E(t))! = 0 for all t E (tn, tn+l)·
Proof of the Upper Bound. Fix € > 0, and choose¢(·) with ¢(0) = x
such that if T = {t: ¢(t) E 8G}, then
{T/\r
lo k(¢(s), <i>(s))ds + g(¢(T 1\ r), T 1\ r):::::; V(x) + €. (3.14)
The case T > T. By Lemma 3.3 and (A3.2), there exists ¢E(-) satisfying
the conditions of the lemma and also
(3.15)
For each n < N, let an = <i>E(t), where t is any point in (tn, tn+l)· If
(¢E(t))l =1- 0 fortE (tn, tn+l), then we define a~1 ) = a~2 ) =an. If (¢E(t)h =
0 for t E (tn, tn+l), then we must prescribe a control that will yield a
running cost close to k<0 >( ¢E (t), an). By exploiting the continuity of the
k<i>(-,·), j = 1,2, there exist p~1 >,p~2 >,a~1 ) and a~2) which satisfy
Pn(l} + p(2}
n
= 1
'
p(l}
n
> 0 ' p(2}
n
> 0' (3.16)
(3.17)
396 14. Problems from the Calculus of Variations: Finite Time Horizon
(l)a(l}
Pn n + p(2)a(2)
n n =an (3.18)
and
for all t E (tn. tn + c5), where c5 is some positive number. Because t/l(t) is
not constant, it may not actually be the case that (3.19) can be guaranteed
for all t E (tn, tn+l) simultaneously. However, the continuity properties of
k(i}(·, ·)and </>£0 imply that we can replace the original partition by a finer
partition 0 =to< · · · < t& = T (if necessary) such that if (</>£(t)h = 0 for
t E (tn, tn+l), then (3.19) holds for all t E (tn, tn+l)· For simplicity, we will
retain the same notation for the original and refined partition.
For the remainder of the proof we will assume that h and c5 are small
enough that
u~~ol {a~l}' a~2}} c uh,6.
We can then define a nonanticipative control scheme for {~?· 6 , i < oo} in
terms of the a~>'s. For ic5 E (tn, tn+d, we set
h, 6 _ { a~1 ) if (~?· 6 )1 ~ 0
ui - a~2} if (~?,6h > 0.
Define ~h· 6 (·),mh· 6 (·),v(l},h, 6 (·),vC 2 },h, 6 (·) and Th,6 as in the proof of the
lower bound, but for this new control sequence. Because the a~/>'s are all
bounded, the collection
Let the Skorokhod representation be used, arid fix any w for which there is
convergence. Let v~ 1 )(·) and v~ 2 )0 be the derivatives of vC 1>(·) and vC 2>(·),
respectively.
Fix n < N, and assume for now that x(tn) = </>£(tn)· It will be proved
below that this implies x(tn+l) = </>£(tn+l), and therefore this assumption
will be justified by induction. First consider the case (</>£(t)h :f; 0 for all
t E (tn, tn+d· Then by the definition of the control scheme
Therefore, x(tn+d = </>£(tn+l) in this case. Next consider the case (</>£(t)h
= 0 for all t E ( tn, tn+l). The definition of the control scheme uh•6 ( ·) over
14.3 Problems with a Discontinuous Running Cost 397
(3.20)
a.s. for s E (tn, tn+l). An elementary stability argument that uses the
Lyapunov function f(x) = l(xhl and (3.9), (3.12), (3.17), and (3.20) shows
that (x(s)h = 0 for s E (tn.tn+d· Therefore,
Note that under (3.17), p~1 ) and p~2 ) are uniquely determined by p~1 ) +
p~2) = 1 and
. umqueness
ThIS · Imp r 118U> (da )
· 1·Ies JR.k = PnU> , and , there1ore,
r x· ( s ) = Pn<1>an<1>
(2) (2) •
+Pn On = On a.s. for s E (tn, tn+l)· Smce we have assumed x(tn) =
rp£(tn), this implies x(tn+l) = rp£(tn+l)· By induction, x(s) = rp£(s) for
s E [0, T). Together with r/J£(t) E G 0 for t E [0, T), this implies f > T.
We now use the properties of the v~i) ( ·) shown in the previous paragraph
to prove the upper bound. For the given control scheme, we have
The boundedness of u:;;01 { a~1 ), a~2)} and the dominated convergence the-
398 14. Problems from the Calculus of Variations: Finite Time Horizon
(3.23)
where
if (¢€(t)h <0
if (¢€(t)h > 0
if (¢€(t)h = 0.
The definition of the an and the a~) imply that (3.23) is bounded above
by
1T k(¢•(s), if>€(s))ds + g(¢•(T), T) + Tf.:::; V(x) + (2 + T)f..
Note that (3.22) and (3.23) hold w.p.l. By again using the boundedness
of u~,:01 { a~1 ), a~2 )} and the dominated convergence theorem, we have
limsupE~ '
h6[1TI\Th,6 f. _k(eh• (s), a)mh• (dads)
6 6
(h,.S)--tO 0 Rk
and, therefore,
lim sup Vh· 6 (x) :::; V(x) + (2 + T)f..
(h,.S)--tO
arbitrary t: > 0 and t: 1 > 0, we can find a path ¢l (·) according to Lemma 3.3
such that <jJE(t) E G0 fortE [O,r-t:1) and <jJE(t) ~ G fortE [r,r+t:1]· This
allows us to control the time and location of exit. The proof now follows
the same lines as for the case r > T. •
for 1 ~ j ~ J and that maxi (di - Cj) < 'Y. For simplicity, we assume c1 = 0
and dJ = T. The required changes when this is not the case are slight.
Define
u~ = -d
_ ·.
J
1
CJ
ld·
Cj
3
t/>(s)ds.
If Cj+l > dj, let ej,k ~ Kj, be such that dj = e} < eJ < · · · < efi = ci+1
and maxj(eJ+l- ej) < 'Y· Note that (¢(t)h -=/= 0 fortE (dj, Cj+l)· Define
Finally, we set
uE(t) = t
j=1
(l{tE(cJ,di)}uJ + ~1 l{tE[eJ,e;+l)}uJ) ,
k=1
¢E(t) = x + 1t uE(s)ds.
~ 1-'
d·
k(¢(c;),¢(s))ds
~ 1.::t.
3
3
( k(¢(s),¢(s)) +/{E) [M + k{¢(s),¢(s))]) ds.
The first inequality in the last display is due to the convexity of each of
the k(i)(x, ·), i = 1, 2, 3, and the fact that k< 0>(x, a) ~ k< 1>(x, a) l\k< 2>(x, a)
whenever (a )I = 0. Convexity and the definition of the uJ imply that
~ {1+{2+/{E))/{E)) 1.'
d·
3
(k(¢(s),¢(s))+(2+/(E))j(E)M)ds.
A similar estimate applies for each interval of the form [ej,eJ+ 1 ). Com-
bining the estimates over the different intervals gives the last inequality in
the statement of the lemma. Finally, the last sentence of the lemma is a
consequence of the definitions of the c;, d;, and ej. •
15
Problems from the Calculus of
Variations: Infinite Time Horizon
this issue. One approach is to perturb the running cost so that it is strictly
positive, and then remove the perturbation ash--+ 0. This method is dis-
cussed in Subsections 15.3.1 and 15.3.2. It turns out that one cannot send
the perturbation to zero in an arbitrary way, and even when the perturba-
tion tends to zero at a sufficiently slow rate one must impose conditions on
the "zero cost" sets K(x) ={a: k(x,a) = 0}. These conditions essentially
take the form of a description of the large time behavior of any solutions
to the differential inclusion¢ E K(</J). We first consider the case in which
all solutions to the differential inclusion that start in a neighborhood of G
are either attracted to a single point x 0 E G0 or else leave G. This result is
then extended to cover the case in which the limit set is given by a finite
collection of connected sets.
The alternative approach uses the same numerical scheme (and related
iterative solvers) as Section 15.2, but imposes greater restrictions on K(x).
The set {x: K(x) I- 0} is assumed to be the union of a finite collection of
connected sets, and additional conditions are assumed which imply that V
is constant on each of the sets in this collection. This approach is developed
in Subsection 15.3.3 in the context of an interesting application, namely, a
calculus of variations formulation of a problem in shape-from-shading. Also
included in Subsection 15.3.3 is a weakening of the condition required of G
in which we allow the complement of G to contain isolated points.
When approximating any stationary control problem an important issue
is the efficiency of the associated iterative methods for solving the dis-
cretized equations (see Chapter 6). Many different aspects of this issue are
considered in the chapter. For example, the special case where the running
cost is quadratic in the control (for each fixed value of the state) occurs
in many problems (e.g., large deviations and minimum time problems).
In Subsection 15.2.2 we show that if the "canonical" approximating chain
for a calculus of variations problem is used then the discretized dynamic
programming equation can be solved explicitly. This has significant impli-
cations for performance of the algorithm, since numerical minimizations
are not required for the iterative solvers discussed in Chapter 6. A special
case of this calculation is applied to the shape-from-shading problem in
Subsection 15.3.3, and numerical results are discussed there and for the
general case in Section 15.4. Section 15.4 also discusses other qualitative
aspects of the iterative solvers, and in particular the relation between the
number of iterations required for convergence of the solver and qualitative
properties of the approximating chain. It turns out that one can design
chains and associated iterative schemes that converge in a small number of
iterations, and with the number of iterations essentially independent of the
discretization level. This property makes the Markov chain approximations
very useful for industrial applications of minimum time and related shape
evolution problems. These features of the Markov chain approximations
were first explored in [14], and extensions to higher order accurate schemes
are developed in [48, 145].
15.1 Problems of Interest 403
where the infimum is over the same set of paths as (1.1) and all p 2: 0.
Note that g(·) combines the stopping cost and the cost that is added when
the set G0 is exited. Thus, g( ·) will often be discontinuous on 8G.
Owing to the fact that p is potentially unbounded, one must impose
suitable conditions on the running cost k(·, ·) in order to guarantee that
404 15. Problems from the Calculus of Variations: Infinite Time Horizon
the minimal cost is bounded from below. We will make the assumption that
k(x, a:) ~ 0 for all x and a:. The mathematical problem (1.2) may still be
well posed even if this condition is violated. However, because most current
applications satisfy the nonnegativity condition we restrict our attention
to this case.
Except where explicitly stated otherwise, the following assumptions are
used throughout the chapter.
Al.l. The set G is compact and satisfies interior and exterior cone con-
ditions: There exist f > 0 and continuous functions v( ·) and w( ·) such that
given any x E 8G, Uo<a<eBw(x+av(x)) C G and Uo<a<eBea(x+aw(x))n
G=0.
A1.2. The function k(·, ·) is continuous and nonnegative, and it satisfies
the superlinear growth condition (14.1.1). There exist M < oo and f: JR.---+
[0, oo) satisfying f(t)---+ 0 as t---+ 0, such that for all x andy,
if x E G~ and Vh(x) = g(x) for x ¢~·We then have the following result.
Theorem 2.1. Assume (A1.1), (A1.2), and (A1.3). Then for the scheme
defined by (2.1) we have
The proof combines the ideas used in the proofs of Theorems 10.6.1 and
14.2.5 and is omitted.
15.2 Numerical Schemes for the Case k(x, a) ~ ko > 0 405
for t E [0, r], where the control v is square integrable over any interval
[0, T], T < oo. Associated with these dynamics is the cost
1
W(x, v, p) = 2 Jo
r"P [lv(t)l 2 + c(¢(t))] dt + g(¢(r)), (2.2)
where c, g : JRk I-t 1R are continuous. Further assume that v takes values
in JRk and that a(·) = a(·)a'(-) is uniformly positive definite on G. By
defining the running cost
1
k(x,a) = 2(a- b(x))'a- 1 (x) (a- b(x)) + c(x},
the dynamics and the cost above can be rewritten in the calculus of varia-
tions form (1.2).
Note that a more general cost for the original control problem could
also give rise to a calculus of variations problem of the given form. In
particular, in place of lvl 2 in (2.2} one could consider a positive definite
quadratic form [e.g. (v- h(x))'A(x)(v- h(x))] and still obtain a calculus
of variations problem from within the class described above.
Other important examples are minimum time type problems, which can
be converted into this form by a simple transformation of the independent
variable. For example, the standard minimum time problem with dynamics
x = u, lui ~ 1 and cost k(x, u) = 1 can be put into the calculus of variations
form (i.e., dynamics (p = u,u unconstrained} by using the cost k(x,u) =
iul2/4 + 1.
Consider the natural approximating chain of Example 14.2.1. It turns
out that for this class of problems and with this approximating chain the
dynamic programming equation (2.1) can be solved more or less explic-
itly, and moreover the naturally associated Gauss-Seidel iterative scheme
for solving this equation [cf. Section 6.2] is extremely efficient. In all two
and three dimensional problems on which it has been tested, the iterative
scheme converges to the solution after a small number of iterations, with
the number of iterations is essentially independent of the discretization
level h. These properties of the iterative scheme can be best understood by
using its interpretation as a functional of a controlled Markov chain, and
discussion on this point is given in Section 15.4. The purpose of the present
406 15. Problems from the Calculus of Variations: Infinite Time Horizon
section is to indicate how the quadratic cost and properties of the approxi-
mating chain can be exploited to efficiently solve the dynamic programming
equation.
Although we do not discuss the matter here, the degenerate case [i.e.,
when a(x) is not positive definite] is also of interest, and under appropriate
conditions this case can be dealt with just as easily [14]. We also note that
schemes with higher order accuracy can be based on these Markov chain
approximations [48, 145], and that the iterative schemes used to solve for
these higher order approximations inherit the favorable properties of the
lower order scheme considered here.
When solving the dynamic programming equation (2.1) by an iterative
scheme, the infima that must be calculated at each iteration depend on the
procedure chosen (e.g. Jacobi, Gauss-Seidel). However, for all the standard
iterative schemes the quantities that must be computed are always of the
form
where f is a real valued function defined on the grid h?Lk and which de-
pends on the particular iterative method used. Let llall1 = 2::~= 1 lail· For
any x E Gh and for the given ph and tlth, the infimization becomes
r= 1 ,~ 1 [ -vB'1+1'A.B+vfs ], (2.5)
=B-~[(.1
w v JJ + (vB'1-1'A.B-vfs)]
1'A1 1 .
w
= B_ ~
v
(f.l + (vB'1-1'A,B-
fJ 1'A1
vp,) )
1
- 1- { -+-+1
1'A1 2
VJL s
2vp,
'A ,B-vB1 I }
'
which is to be infimized over JL > 0. Clearly, if s < 0 then the infimum is
equal to -oo. If s ~ 0 then the derivative with respect to JL has two roots,
of which only the positive root, p,* ~ vfs/v, satisfies the constraint on JL
and gives a local minimum, which is in fact a global minimum. Substituting
p,* in the expression for w gives the form of the minimizer, and substituting
the minimizer in (2.4) yields the minimum value expressed in (2.5). •
408 15. Problems from the Calculus of Variations: Infinite Time Horizon
inf
{a:a,~O,aoFO}
[-*{hk(x,o) +
0
Lod(x
i
+ hei)}]·
inf
{ a:a,~O,aoFO}
[-+-1 {~2 (o-b)'a- 1 (o-b)+hc+o'!}].
0
(2.7)
The solution to (2.7) will depend on whether or not b or cis 0, and thus
we consider the two cases separately.
the interior of the orthant was under consideration, the first step is to com-
pute the minimum as if the rest of the variables were unconstrained (save
the constraint a1 + ··· + ak-1 > 0). The solution of this lower dimen-
sional problem requires a change of variables. Define 5 = (at, ... , ak_l)',
b = (bt, ... ,bk-1)', k = (Kt, ... ,Kk-1)' and b = bk. We can then write
(with matrices A 11 , A 12 , A21 , A22 of appropriate dimension)
Lemma 2.2 can be applied once more to compute the minimum of this
quantity. If the solution satisfies the constraints ai 2:0, i= 1, ... , k-1, and if
the gradient of the original objective function points into the k-dimensional
orthant, then the search in the orthant is complete. If not, then the search
continues through the remaining faces, and if necessary, through the lower
dimensional faces. For example, ifk = 3 and the minimum over the orthant
{a 1 2: 0, a 2 2: 0, a3 2: 0} is not found in the interior, the search continues
on the faces {a1 2: O,a2 2: O,a3 = O},{at 2: O,a3 2: O,a2 = O},{a2 2:
0, a3 2: 0, a 1 = 0}. If the search there fails as well, one continues with the
faces {a 1 2: 0, a 2 = a 3 = 0}, etc. Lemma 7.2 of [14] guarantees that the
procedure will find the unique global minimum.
where the infimum is over the same functions and stopping times as in
the definition of V(x). Clearly, k11 (·, ·)satisfies the conditions of Subsection
15.2.1. Furthermore, it is quite easy to prove that V11 (x)-!. V(x) as rJ--+ 0.
Based on the strict positivity of k'1(·, ·),the methods of Subsection 15.2.1
can be applied to obtain approximations to V11 (x). If ry > 0 is sufficiently
small, then V11 (x) should be a good approximation to V(x). Although this
is the basic idea behind the algorithm presented below, it turns out that
we cannot send ry --+ 0 and h --+ 0 in an arbitrary fashion. Rather, rJ and h
must simultaneously be sent to their respective limits in a specific manner.
Numerical Schemes. Let ph(x, yla) and fl.th(a) satisfy the local consis-
tency (14.2.1) and (14.2.2). For each ry > 0, we define an approximation
V11h(x) of V11 (x) by
if X E G~ and v;(x) = g(x) for X t/. G~. It follows from the positivity of
ry that (3.2) uniquely defines V11h(x). Our approximation to V(x) will then
be given by Vh(x) = V11~h)(x), where ry(h) tends to zero ash--+ 0.
Theorem 3.1. Assume (ALl), (A1.2), and (Al.3) and that 71(h) > 0 is
any sequence satisfying 71(h)-+ 0 ash-+ 0. Then
Proof. Let any e > 0 be given. Because k"(·, ·) . ). k(·, ·), there exists a
relaxed control m( ·) and an associated stopping time p < oo such that if
x(t) = x + t ]IR~c
lo
f am(dads), r = inf{t: x(t) E 8G},
then
1p/\T1
0 JRic
k11 (x(s),a)m 8 (da)ds + g(x(p 1\ r)) ~ V(x) +f
for all sufficiently small 7J > 0. Using Theorems 14.2.2 and 14.2.3 in the
same way as they were used in the proof of Theorem 14.2.5 we can extend
the definition ofm(·) beyond T such that m(·), x(·), and p have the following
properties. Either
x(·) E CO for all t E [O,p] (3.3)
or p > T, and there is v > 0 such that
x(t) E G 0 fortE [0, r), x(t) fl. G fortE (r, T + v],
1t k11 (x(s),a)m 8 (da)ds-+ 0 as t..).. T
{3.4)
uniformly in all sufficiently small 7J > 0. The remainder of the proof for
these two possibilities is analogous to the proof of Theorem 14.2.5 for the
two cases of (14.2.22) and {14.2.23). We will give the details only for the
case of {3.3). The proof for (3.4) is a straightforward combination of the
ideas used in the proofs for the cases {3.3) and {14.2.23).
By Theorem 14.2.2 we may assume the existence a finite set UE 1 C JRk,
i5 > 0, and an ordinary control uE 1 ( ·) with the following properties. uE 1 ( ·)
takes values in UE 11 is constant on intervals of the form (ji5,ji5 + 15), and if
xE 1 ( ·) is the associated solution, then
and
sup
09:5p
Ilot Jf
JRk
t k(xE (s),uE (s))dsl ~fl.
k(x(s),a)m 8 (da)ds-
lo
1 1
In order to apply UE 1 (.) to the chain {ef' i < 00}' we recursively define
the control applied at discrete time i by uf = UE 1 ( tf) until Ph = tf+l ~ p,
15.3 Numerical Schemes for the Case k(x,a) ~ 0 413
Ph ---+ P·
The proof from this point on is an easy consequence of the fact (due to
Theorem 14.2.4) that sup0 ~t~Ph J~h(t)- x€ 1 (t)J ---+ 0 in probability, and is
the same as that of the finite time problem treated in Theorem 14.2.5. •
It will turn out that much more is required for the lower bound, including
a condition on the rate at which TJ(h) ---+ 0. In the next example we show
that if TJ( h) ---+ 0 too quickly, then
is possible.
Clearly, lim infh--+0 V11~h) {0, 0) ~ 0. To prove lim suph--+O V11~h) {0, 0) :5 0, we
will exhibit a control scheme and TJ(h) > 0 that achieves cost arbitrarily
close to zero. For each i, let
Note that the running cost for this control process is identically zero.
With this definition of the control, the process {~f, i < oo} always sat-
isfies J(~fh + {~f)2J < 1/2 (for small enough h). Thus the control scheme
414 15. Problems from the Calculus of Variations: Infinite Time Horizon
accumulates no running cost and also prohibits the process from exiting
near either the northeast or southwest corners (where the exit cost is high).
However, it does allow exit near either of the other two corners, where the
exit cost is zero. The process { (~f) 1 - ( ~f )2, i < oo} is easily recognized as a
symmetric random walk on {ih, i E ..Z}. As is well known, this implies that
for any given value M, there exists i < oo such that l(~fh- (~fhl 2:: M
(w.p.1). Therefore, given any t: > 0 and the initial condition ~8 = (0, 0),
there exists n < 00 such that the process ~f will enter aa h at some point
x [with g(x) = OJ by discrete time n and with probability greater than
1 - t:. Thus, lim sup71 --to V,:O(O, 0) = 0, which demonstrates the existence of
'fl(h) > 0 such that V71~h}(O,O)--+ 0.•
lo(lal~th(a))l ~ c(h)iai~th(a),
imply
Remarks. Thus, solutions with zero running cost either are attracted to
the point x 0 or else leave on open neighborhood of C in finite time. The
sets K(x) often have a particular significance for the underlying problem
which gave rise to the calculus of variations problem. See the book [59]
and the example of the next subsection. Note that we may, of course, have
points x where K(x) = 0.
Theorem 3.2. Assume (ALl), (A1.2), (A1.3), and (A3.1). Then for the
scheme defined by (3.5) and (3.2), we have
liminfVh(x) ~ V(x).
h--+0
Lemma 3.3. Assume (A1.2) and (A3.1). Then given any 1 > 0 and
M < oo, there is T < oo such that if x(t) = JJRk amt(da) (a.s.) for an
admissible relaxed control m(·), and if x(t) E C- N-y(xo) fortE [O,T],
then J{ JJRk k(x(s),a)m(dads) ~ M.
Proof of Theorem 3.2. It is quite easy to show under (ALl) and (Al.2)
that given f > 0 there is 1 > 0 such that lx-yJ ::; 1 implies V(x) ~ V(y) -E
for all x, y E C [i.e., V (·) is uniformly continuous on C]. Thus, for any E > 0,
416 15. Problems from the Calculus of Variations: Infinite Time Horizon
there is 'Y > 0 such that IV(x)- V(xo)l ~ f for x E N-y(xo). For this 'Y and
M = V(xo) + 1, choose T according to Lemma 3.3.
We first consider the case when x E N-y(xo). Owing to the definition
ef,
of Vh(x) there is a controlled Markov chain { uf, i < oo} that satisfies
e~ = X and a controlled stopping time N h such that
(Nh/\Mh)-1
Vh(x):::: E;h L k"'(h)(ej, uj)ath(uj) + E;h g(e'Nhi\Mh)- f, (3.6)
j=O
where Mh is the time of first exit from 00. Let eh(·) and uh(·) be the
continuous parameter interpolations of { ef' i < 00} and {uf' i < 00}' re-
spectively, and let
Nh-1 Mh-1
Ph= L ath(uf), Th = L ath(uf).
i=O i=O
h [Phi\Th h
Vh(x):::: E: lo k"'<h>(eh(s), o:)mh(do:ds) + E: g(eh(Ph Arh))- f.,
(3.7)
where mh(·) is the relaxed control representation of the ordinary control
uh(·). The superlinearity condition (14.1.1) implies
for some fixed b < oo. Choose s(h) -t oo such that c(h)s(h) -t 0 and
TJ(h)s(h) -t oo. This is always possible under (3.5). When combined with
c(h)s(h) -t 0, the last equation implies
w.p.l.
Fix an w E n such that these convergences hold. Define
15.3 Numerical Schemes for the Case k(x,a) 2:: 0 417
and define
Th = { (Ph 1\ Th)- Uh, Uh <Ph 1\ Th
0, (Jh ;::: Ph 1\ Th·
Thus, Th is the interpolated time between Ph 1\ Th and the time before
that when ~h(·) last left N...,(x 0). We will now carefully examine the sample
paths over the interval [ah, (Ph 1\ Th)].
Define t(-) = ~h(· + ah) and define mh(-) to be the relaxed control
representation of the ordinary control uh(· + ah)· Finally, let vh = (Ph 1\
Th)/s(h) and
Since { mh(-), h > 0} is always precompact on the space 'R,(lll x [0, oo)), we
can extract a convergent subsequence with limit denoted by (m( ·), T, p, r, a,
r,v).
The case v > 0. Since vh = (Phi'ITh)/ s(h), this case implies that Phi'ITh --7 oo
quite fast. In fact, the condition 17(h)s(h) --7 oo implies
For the remainder of the proof we can assume v = 0. This implies that
Ph 1\ Th < s(h) for sufficiently small h > 0 and, in particular, gives the
estimate {3.8) without s(h) appearing. We can also exclude the case r = oo
because in this case the running costs again will tend to oo. An argument
very similar to that used in Theorem 14.2.4 shows that if
{3.9)
as h -t 0 fortE [0, T]. For the given w it follows from (3.8) that
where
x(t)- x(O) = ft f am(dads)
lo JJRk
fortE [0, T]. Note that by construction x(O) E N-y(x 0 ) and x(t) fl. N-y 12 (xo)
fortE (0, Tj. Thus, we are examining the limits of eh(·) after the processes
have left N-y/ 2 (xo) for the last time. Note also that it may be the case that
a= oo, in which case x(·) cannot be determined from x(·).
The case p 1\ T < oo. For this case, we are essentially back in the setting of
1 1
the finite time problem, and the proof of Theorem 142.5 shows that
(3.11)
(3.12)
If we combine all the cases and use the assumption that x E N-y(xo), we
obtain
15.3 Numerical Schemes for the Case k(x, a) ~ 0 419
w.p.l. Therefore,
liminfVh(x) ~ V(x)- 3E uniformly for all x E Ny{x 0 ) (3.13)
h.--tO
For case (3), the definition of Th implies ~h("Th) E N 1 (xo) for all small
h > 0. Using the Markov property and the previous estimates for paths
that start inside N"'(xo), we have
where
V(x) =in£ { 1T k(¢, tP)ds + V(¢(T)) : ¢(0) = x, ¢(T) EN"'(xo), T > 0}.
By the dynamic programming principle of optimality, V(x) is bounded
below by V(x). For the last case of 7' = T = p = oo, Lemma 3.3 gives
liminf
h---to
rhATh
}0
r kfl(h)(~h(s),o:)mh(do:ds) =
} JRk
00.
The proof of Theorem 3.2 can easily be extended. One such extension
uses the following generalization of {A3.1).
A3.2. There exist a finite collection of disjoint compact sets {Ki, i =
1, ... ,N} and an open set Gt containing G with the following properties.
We require that U~ 1 Ki c G0 , and that given any 'Y > 0, there exists
T < oo such that for all t ~ T,
x(O) E G 0 , x(s) = 1
H"
am 8 (da) (a.s.)
rt1
and
k(x(s),a)m(dads) = 0
Jo H"
imply
Outline of the Proof. The proof is essentially the same as that of The-
orem 3.2 but requires a good deal more notation. The constructions that
would be used are very similar to those that are used in the proof of The-
orem 3.7 below. An outline of the proof is as follows. Suppose that the
quantities eh(·),mh(·),Th,Ph. and so on, are defined as in Theorem 3.2.
Assume the Skorokhod representation is used and that a convergent sub-
sequence is under consideration. If the limit of Ph 1\ Th is finite, we are
dealing with a finite time problem (at least as far as the convergence of
the costs is concerned). We therefore assume that this is not the case. By
suitably defining random times, we can keep track of the excursions of the
15.3 Numerical Schemes for the Case k(x, a) ~ 0 421
in! {loT k(~, J,)ds, ~(0) E N7 (K;), ~(T) E N (K;), T > 0}- <.
7
Proof of Lemma 3.4. The proof is very close to that of Lemma 2.2 in
[59]. We first claim there exists T < oo and c > 0 such that x(t) E G-
(u~= 1 N7 (Kn)) fortE [O,T] implies for JJRk k(x(s),a)m(dads) 2: c. If not,
we can find Ti--+ oo, xi(O) E C-U~= 1 N7 (Kn), and mi(·) E 'R.([O, Ti] xJRk),
such that for Xi ( ·) defined by
{T; { k(xi(s),a)mi(dads)--+ 0.
lo JJRk
We can choose a convergent subsequence such that
roo
lo JJRk
r k(x(s),a)m(dads) = 0,
422 15. Problems from the Calculus of Variations: Infinite Time Horizon
which contradicts (A3.2). Thus, there are T1 < oo and c > 0 satisfying the
conditions of the claim. The conclusion of the lemma now follows if we take
T = T1Mjc .•
between the heights based on this reconstruction. We will also assume that
an upper bound B is available for f(·) on G. The set G is given a priori,
and represents the subset of the imaging plane on which data is recorded.
G is often larger than the domain on which the reconstruction off(·) is
desired. We say that a set A c JR2 is smoothly connected if given any two
points x and y in A, there is an absolutely continuous path ¢ : [0, 1] -+ A
such that ¢(0) = x and ¢(1) = y.
A3.3. Let H c G be a compact set that is the closure of its interior,
and assume H is of the form H = nf=
1 Hj, J < oo, where each Hj has a
continuously differentiable boundary. Assume that fx is continuous on the
closure of G, and that K = {x: I(x) = 1} consists of a finite collection of
disjoint, compact, smoothly connected sets. Let L be the set of local minima
off(·) inside H, and define nj(x) to be the inward (with respect to H)
normal to 8Hj at x. Assume that the value off(·) is known at all points
in L, and that fx(x)'nj(x) < 0 for all x E 8H n 8Hj,j = 1, ... , J.
Remarks. It turns out that the minimizing trajectories for the calculus
of variations problem to be given in Theorem 3.6 below are essentially the
two dimensional projections of the paths of steepest descent on the surface
represented by the height function. Thus, the assumptions that are placed
on H in (A3.3) guarantee that any minimizing trajectory that starts in H
stays in H. Theorem 3.6 shows that the height function has a representation
as the minimal cost function of a calculus of variations problem that is
correct for all points in the union of all sets H satisfying (A3.3). If we
consider an initial point x E G such that the minimizing trajectory exits
G, then we cannot construct f (·) at x by using the calculus of variations
representation because this reconstruction would require I(x) for values of
x outside G. If we assume that the height function is specified at the local
maximum points, then we can consider an analogous calculus of variations
problem with a maximization.
The following theorem is proved in [124].
Theorem 3.6. Assume (A3.3), and for x E JR2 and o: E JR2 define
for x E L
g(x) = { ~(x) for x <j. L
and
1 1( 1
k(x,o:)=2lo:l 2 +2 J(x) 2 -1
) 1 21
=2lo:l +21fx(x)l 2
·
Define
V(x) =inf [1pAT k(¢(s),¢(s))ds+g(¢(p!\r))],
where T = inf {t : ¢( t) E 8G u L} and the infimum is over all p > 0 and
absolutely continuous functions¢: [0, p] -+ G that satisfy ¢(0) = x. Then
V(x) = f(x)
424 15. Problems from the Calculus of Variations: Infinite Time Horizon
for all x E H.
Remark. If g(x) is set to a value that is less than f(x) for some x E H,
then, in general, we will not have V(x) = f(x) for x E H. For example, if y
is any point at which J(y) = 1 and if g(y) < f(y), then V(y) ~ g(y) < f(y).
We next present a numerical procedure for solving for V(x). One feature
of this problem that is quite different from those considered previously is the
nature of the target set. For example, consider the case when L = {x0 } for
some Xo E G0. The target set is then aa u {Xo}' and (ALl) does not apply.
The interior cone condition holds, but the exterior cone condition fails. The
exterior cone condition was used in the proof of the upper bound for all
convergence theorems that have been presented so far in this chapter. In
those proofs, if an optimal (orE-optimal) path¢(·) terminated on a target
set 8G at time p, then the exterior cone condition was used to define ¢( ·)
on a small interval (p, p + v], v > 0, in such a way that the added cost
was arbitrarily small and ¢(t) rf. G for t E (p, p + v]. This ¢(·) was then
used to define a control scheme for the chain, and because ¢( ·) had been
constructed in this way, the exit times of the chain converged to p. See, for
example, Subsection 14.2.3. If the target set does not satisfy an exterior
cone condition, then this construction is no longer possible. Target sets such
as an isolated point are typically difficult to deal with when proving the
convergence of numerical schemes. A common technique is to replace the
target set A by N'Y(A), 'Y > 0, prove an appropriate convergence property
for the problem with this target set, and then send "f -+ 0. We will show in
this subsection that this "fattening" of the target set is not needed when a
mild additional condition on the chain is assumed to hold.
Let Vr(x) denote the optimal cost if the controlled stopping time is
restricted to the range [0, T]. Our assumption that B is an upper bound
for /(·) on G implies that it is never optimal to stop at a point in G- L
whenever T is sufficiently large. The stopping cost for points in G - L was
actually introduced in the definition of the calculus of variations problem
of Theorem 3.6 solely for the purpose of forcing optimal trajectories to
terminate in L. This use of a stopping cost could be avoided altogether if
the minimization in the calculus of variations problem were only over paths
that terminate in L at some finite time. However, this added constraint
would be rather difficult to implement in the numerical approximations.
We will see below that because g( ·) is the proper stopping cost to introduce
to force the trajectories to terminate in L, it also provides the proper initial
condition for the numerical scheme.
Because the target set can possibly contain isolated points that may not
be included in Gh, we really need to introduce a "discretized target set"
Lh C Gh, and redefine g( ·) in the obvious way. We would need that Lh -+ L
in the Hausdorff metric [i.e., d(x, L) ~ Eh for all x E Lh, d(x, Lh) ~ Eh for
all x E L, and fh -+ OJ. To simplify the notation we will just assume
15.3 Numerical Schemes for the Case k(x,o);::: 0 425
LcGh.
Unlike the general situation of Subsection 15.3.1, for this problem we can
approximate the finite time problem VT(x) and then send T ---too and the
discretization parameters to their limits in any way we choose. This allows
the use of the cost k(·, ·),rather than k"'(h)(·, ·),in the following alternative
to the method of Subsection 15.3.1.
The existence of the limit in the definition of Vh(x) follows from the
monotonicity v:+l (x) ~ v:(x). Note that we do not specify Vh as the
solution to a dynamic programming equation. This would not be correct,
since the only possible dynamic programming equation has multiple so-
lutions. Instead, Vh is defined as the limit of an iterative scheme with a
particular initial condition, which is chosen to pick out the "right" solution.
Remark. The iteration {3.15) is of "Jacobi" type (see Chapter 6). Conver-
gence can also be demonstrated when (3.15) is replaced by a "Gauss-Seidel"
type of iteration [46].
{Vn"(x + h{1, 0)), Vn"(x- h{1, 0))} and {Vnh(x + h(O, 1)), Vn"(x- h(O, 1))},
respectively. Define m = {1/I2 (x))- 1. If 0 ~ h 2 m < (v 1 - v2 ) 2 , then we
use the recursion
2. L~o- 1 ~th(uf) ~ e,
3. L~o- 1 k(~f,uf)~th(uf) ~ e
w. p.l. Hence, if the proof of the upper bound can be given when the target
set is of the form N.-y{x 0 ) (i.e., a set that satisfies the exterior cone con-
dition), then it can be given for target sets of the form {xo} as well. The
details will be given below. A formulation of conditions on the Markov chain
that includes Example 14.2.1 and the chains derived from this example via
{14.2.3) and {14.2.4) is as follows.
A3.4. Given e > 0 there exist h0 > 0, 'Y > 0, and M < oo with the following
properties. Given any h < h 0 and any x, y E Gh such that lx- Yl < "(,
there exists a nonanticipative control scheme {uf, i < oo} satisfying iuf I ~
M,i < oo, with the following properties. If {~f,i < oo} is the resulting
controlled chain that starts at x and if N h = inf{i : ~f = y}, then
Remark. Recall that L was defined in (A3.3) as the set of local minima
inside H. The points {x : I (x) = 1, x E G - L} are the singular points that
are not local minima inside H. On these points we have k(x, 0) = 0, and
thus the assumptions used in Subsection 15.2.1 do not apply.
Proof of the Upper Bound. The proof of the upper bound follows the
lines of the finite time problem of Theorem 14.2.5 except for the difficulties
related to the nature of the target set. Fix x E G- L. If V(x) = B,
there is nothing to prove. Assume V(x) < B, and let E > 0 be given such
that V(x) + 2E < B. Recall that Vr(x) is the minimal cost subject to the
restriction p E [0, T]. Since Vr(x) ..j,. V(x), there exists T < oo such that
Vr(x) :::; V(x) + E <B. If m(·) is an E-optimal relaxed control for Vr(x)
and if x( ·) is the associated solution, then x( T) E L for some T ::=:; T, and
and
sup I t Jr
o~t~r Jo JR2
k(x(s),a)ms(da)ds- t k(x"YI (s),u"YI (s))ds/:::;
Jo
2 2 f.
We now define a control scheme for the Markov chain. We will use u"YI 2 (·)
to define the scheme until the interpolated time reaches T. If the chain has
not yet entered the target set by this time, then we may have to extend
the definition for times after T via (A3.4).
In order to apply u"'~l 2 (-) to the chain {ef,i
< oo}, we recursively define
the control applied at discrete time i by u~ = u"YI 2 (t~) and t~ 1 = t~ +
er'
~th (u7). This defines a control until i such that t~+ 1 ~ T. Let { i < 00}
be the chain that starts at x and uses this control.
Define
Nh = inf{i: t? ~Tor erELor er
¢CO},
and let Th = t'Jvh. By Theorem 14.2.4, we have sup 0 ~t~r, \eh(t) -x"'~l 2 (t)\-+
0 in probability, and P{\et - x(T)\ ~ 1} -+ 0. Assume h > 0 is small
enough that P{\e'N"- x(T)\ ~ 1}:::; E.
428 15. Problems from the Calculus of Variations: Infinite Time Horizon
On the set where l~irh -x(r)l < -y, we extend the definition of the control
sequence for discrete times larger than Nh in such a way that (A3.4) is
satisfied. We then stop the process at the discrete time
EJ:h rh!.
Jo n2
k(~h(s),a)mh(dads)
+P{i~irh -x(r)i ~ -y}B+t: sup k(y,a)
yEG,Ials;M
+P{i~irh -x(r)l <-y,~_t.h EL, and l~th -x(r)i ~t:} sup f(z)
lz-:z:(T)Is;E
+P{i~irh -x(r)l <-y,~_t.h ¢L, or l~th -x(r)i >t:}B.
Sending f -+ 0 gives
lim sup Vh(x) ~ VT(x).
h-tO
Since T < oo is arbitrary,
lim sup Vh(x) ~ V(x).
h-+0
(3.17)
For the rest of the proof we will assume that n has been chosen such that
(3.17) holds. Owing to the definition of v;(x), there is a controlled Markov
15.3 Numerical Schemes for the Case k(x, a) :;:::: 0 429
chain {~f,i < oo} with control sequence {uf,i < oo} that satisfies ~8 = x,
and a finite stopping time N h such that
(Nhi\Mh)-l
v:(x)2:E;h L k(~J,uJ)ath(uJ)+E;hg(~R,hi\Mh)-e, {3.18}
j=O
where Mh is the time of first exit from CO or entrance into the set L. The
stopping time N h is the minimum of n and the controlled stopping time.
Although the chain, control sequence, and stopping times depend on n, we
will not indicate the dependence in the notation. Let ~h(·) and uh(-) be
the continuous parameter interpolations of {~f, i < oo} and {uf, i < oo},
respectively, and let
Nh-l Mh-l
Ph= L ath(uf), Th = L ath(uf).
i=O i=O
v:(x) 2: E';h 1 1
0
phi\'Th
B,2
k(~h(s), a)mh(dads) + E:'h g(~h(Ph A Th))- €,
(3.19}
where mhO is the relaxed control representation of the ordinary control
uh(·).
Let Kq, q = 1, ... , Q be disjoint compact connected sets such that K =
U~= 1 Kq. The existence of such a decomposition has been assumed in the
statement of Theorem 3.7. Now V(x) is constant on each Kq, so there exists
'Y > 0 such that
and such that the sets N"Y(Kq) are separated by a distance greater than 'Y
for distinct q. Because the reflected light intensity I(·) is continuous, there
is c > 0 such that
For simplicity, we will consider the proof of the lower bound for the
case when the initial condition satisfies x E N"Y; 2{Kq} for some q. The
general case follows easily using the same arguments. We define a sequence
of stopping times by
T~ = 0,
aJ inf{t 2: rf: ~h(t) f/. U~=lN"Y{Kq}},
rf inf{t 2: aJ-1 :eh(t) E u~=lN"Y/2(Kq) or eh(t) f/. CO}.
430 15. Problems from the Calculus of Variations: Infinite Time Horizon
where ~j(·) = ~h(· + aj) and where mj(-) is the relaxed control represen-
tation of the ordinary control uh(· + aj). We consider (2h(·), Mh(-)) as
taking values in the space
endowed with the usual product space topology. Owing to its definition,
v;(x) is uniformly bounded from above. Thus Theorem 14.2.4 shows that
given any subsequence of {(Sh(·), Mh(·)), h > 0}, we can extract a further
subsequence that converges weakly, and that any limit point
(3.22)
Define sj = Tjh+l - aj and sh = (sa' s~' ... ). It also follows from (3.21) that
there exists c > 0 such that for all q1 and q2,
(3.23)
[e.g. c =(2c) 112 'Y]·
We now prove the lower bound lim infh-to Vh(x) 2': V(x). Extract a sub-
sequence along which
converges to a limit
By (3.22), the s; for j < J are finite w.p.l, and by (3.23) we can assume
that J is uniformly bounded from above. By construction, if j < J -1 and if
x;(s;) E N7 ; 2 (Kq) (which it must be for some q), then x;H{O) E 8N7 (Kq)·
Recall that L C U~= 1 Kq and that L is the set of local minimum points. It
follows from the definitions of T;h and aj that if Th <Ph and eh(rh) E Kq
for some q, then eh(rjh_ 1) E N7 ;2(Kq) for that same q. On the other hand,
if Th <Ph and {h(rh) ¢ Kq for any q, then g({h(Ph 1\ rh)) =B. Therefore,
in general,
liminf u<eh(Ph 1\ rh)) ~ g(xJ-1(sJ-1))- f (3.25)
h--?0
w.p.l. Now consider the paths x;(·),j < J < oo. Clearly, xo(O) E N 7 (x),
and by (3.20) we have IV(x;-1(s;-d)- V(x;(O))i::::; 2t:. By combining this
with (3.24) and (3.25), we obtain
6). For many control problems, the approximation in policy space method
has significant advantages over approximation' in value space. However,
these advantages dissappear when approximating the solution to deter-
ministic control problems, because of the nature of the controlled process
~(t) = u(t). While the approximating processes ~h(·) are not determinis-
tic, the fact that they are "nearly" deterministic means that information
contained in the boundary or stopping cost is passed back into the interior
in the form of a propagating "front," with points between the boundary
and the front essentially equal to their fixed point values, and with the
values at points on the other side of the front having essentially no effect
on the "solved" grid points. The position of the ''front" moves only when
the policy is updated, and so the extra iterations {under a fixed policy)
required by iteration in policy space are of little benefit.
if x E G~, and Vh(x) = g(x) if x ¢ G~. Suppose one were to use the
approximation in value space method to solve {4.1). Let the iterates be
denoted by V;_h(x), and for some given function!(·), let the initial condition
be V0h(x) = f(x). Then ~h(x) can be interpreted as the cost function of a
controlled discrete time Markov chain that starts at x at time 0 and has
transition probabilities ph(x, yia), running cost k(x, a)~th(a), stopping set
aa~. stopping cost g(·), and terminal cost (assigned at time i if the chain
has not yet entered aG~) of!(·). See the discussion in Subsection 6.2.2.
For the calculus of variations problems considered in this chapter the
only given data are the values of V(x) on aG, which must be propagated
back into G0 along the minimizing paths. The situation is similar with
the discrete approximation V;_h(x). The only given data are the values of
V;_h(x) at the points X E aah. This information is "passed along" to the
points x E G~ in the iterative scheme used to solve {4.1) by the controlled
Markov chain. The information is passed along most quickly if the chain
[while evolving under the optimal policy] reaches aGh most quickly. Note
that a large value of f will enhance the movement of the chain toward the
boundary. We will discuss this aspect further below.
In constructing chains that will propagate the data most quickly, the
flexibility in the choice of approximating chain can be used to great ad-
15.4 Remarks on Implementation and Examples 437
cost, a large value of V0h encourages the optimal control to move the chain
towards the boundary (in order to avoid this cost). Thus, the boundary
data can be learned and propagated back into the interior quickly. On the
other hand, a small value of V0h gives no incentive for the chain to seek the
boundary. The accumulated running cost eventually directs the process to-
wards the boundary, but it may take a large number of iterations to do
so and the convergence will be especially slow if the running cost can be
near zero. The extreme case of a running cost that is not bounded away
from zero is considered in Subsection 15.3.3. Here the choice of a proper
initial condition is critical, and owing to nonuniqueness the specification of
the initial condition is an essential part of the algorithm formulation. With
this discussion in mind, we use large initial conditions for all the following
examples.
and with the obvious modification for the Gauss-Seidel algorithm. Here
B is any upper bound for V(x). The additional minimization is needed to
guarantee that Vjh(x) is monotonically nonincreasing in j. Since B is larger
than V(·) on G, this control does not alter the value of V(x), since it is
never invoked by any optimal trajectory.
The ordering of the states is relevant for the performance of the Gauss-
Seidel iteration (see Subsection 6.2.4). In general, the states should be
ordered in such a way that the iteration goes against the tendency of the
flow as much as possible. However, since this is not known a priori, it is
best to alternate between several reasonable orderings of the state variables.
For example, for problems in two dimensions the iteration is performed by
alternating between the following four ordering of states: (i) from top to
bottom, and within each row from left to right; (ii) from bottom to top, and
within each row from right to left; (iii) from left to right, and within each
column from top to bottom; and (iv) from right to left, and within each
column from bottom to top. The ordering for three dimensional problems
is done similarly.
In all the tables, we use m to denote 1/h. Of particular note is the
independence (with respect to m) of the number of iterations required for
convergence by the Gauss-Seidel algorithm.
15.4 Remarks on Implementation and Examples 439
1 2
k(x,o:) = 4lo:l + 1,
in which case the dynamics are simply ¢ = u. We adopt the latter represen-
tation since it is the one that is best suited for the numerical approximations
described in Subsection 15.2.2.
We first analyze a two-dimensional problem on the set
The value function for this problem is defined over 5 different regions as
shown in Figure 15. 7. The derivative of the value function has disconti-
nuities over the boundaries that separate these regions, resulting in sharp
edges in the graph of the value function. As can be seen in Figure 15.6, the
numerical approximation preserves the sharp corners of the figure. Nev-
ertheless, the error in the approximation is highest at points where these
sharp edges occur (Figure 15.8).
440 15. Problems from the Calculus of Variations: Infinite Time Horizon
X2
The approximation results are provided in Table 15.1. The leftmost col-
umn corresponds to the number of grid points on each half-axis. The same
maximum errors were obtained with a Gauss-Seidel procedure with only
5 iterations irrespective of m. The iterative scheme was applied until the
maximum difference between successive iterates was less than .001. The
same stopping criterion is also used for the other examples discussed in
this section.
Example 2: Let G be the open unit cube on JR3 . Consider the running
cost
where
g(xt. 1,xa)
g(x1,-1,xa)
g(x1,x2, 1)
Results for a Gauss-Seidel iteration are given in Table 15.2. The maximum
error is obtained by comparing the approximation with the true solution
given by
if a2 = -1rx1
otherwise,
with
b1(x) = -2nx1 +.Bsinnx2
and 'Y > 0. The "n" scaling above is convenient so that the set of interest
can be taken to be the unit square.
Table 15.3 gives the results for a Gauss-Seidel approximation with 'Y =
0.001 and .B = 1. In the table, we also record the successive differences for
the approximations as a function of m in the rightmost column.
442 15. Problems from the Calculus of Variations: Infinite Time Horizon
In other words, r(x) is the intersection of the unit sphere with the closed
convex cone generated by r1 ( x) for all j such that x E oGi. Our model
then becomes
F(x, v,p, X)= sup [-~tr [Xa(x)]- p'b(x, a)+ [3v- k(x, o:)] .
aEU 2
446 16. The Viscosity Solution Approach
F*(x,v,p,X) = F(x,v,p,X)Vmax{-r'p:rEr(x)}}
for X E 8G.
F.(x, v,p, X) = F(x, v,p, X) A min{ -r'p: r E r(x)}
Remarks. Note that in the definition given above subsolutions and super-
solutions are required to be only semicontinuous and not continuous. This
turns out to allow a considerable simplification in the proof of convergence
of schemes. The technique we will use originates in the papers of Barles
and Perthame [6, 7] and Ishii [77, 76].
A1.2. For each x E 8G, the convex hull of the set r(x) does not contain
the origin. For each X E aa, let J(x) = {j :X E 8G;}. We may assume
without loss of genemlity that J(x) = {1, ... ,m} for some m :::; k. Set
vii = ln~r;(x)l - oi;, where Oij = 0 if i =I j and Oij = 1 if i = j. Then
for each x E 8G, we assume that the spectml mdius of them x m matrix
V = (Vij) is strictly less than one.
Theorem 1.1. Assume (A1.1) and (A1.2). Then the following comparison
result holds. If V* (-) and V. (-) are respectively a subsolution and superso-
lution to (1.2), then
V*(x) :::; V..(x) (1.5)
for all x E G. Define V(x) by {1.1). If V(·) is continuous, then it is a
solution to (1.2).
for all x E G. There are many sets of conditions that will guarantee that
V(·) defined by (1.1) is continuous. Although we will not prove it, the
continuity of V(·) follows under (A1.1) and (A1.2). A proof can be based
on the methods of weak convergence and the uniqueness results given in
[41]. We note that the proof of continuity of V(-) must be based on the
representation {1.1). The continuity of V(·) is related to the continuity of
the total cost under the measure on the path space induced by an optimally
(or €-optimally) controlled process. This continuity has been an important
consideration for the probabilistic approach described previously.
In keeping with the objectives of this chapter, we will not give a proof of
Theorem 1.1. Instead, we will simply piece together some existing results in
the literature. The comments that are given are intended to outline what is
needed to apply uniqueness results for a class of PDE to prove convergence
of numerical schemes for a related control problem. The assumptions we
have made on G and r( ·) make the setup used here a special case of the
general result given in [43]. These conditions relate directly to the reflected
diffusion and need no further discussion.
In the proof of an inequality such as (1.5), it is most natural to place
conditions on F(·, ·, ·,·).It is proved in [43] that the comparison principle
of Theorem 1.1 holds under the following conditions.
448 16. The Viscosity Solution Approach
2.
F(·, ·, ·,·)is continuous on G X IR X IRk X sk. (1.7)
-O ( I 0 ) < ( X 0 ) < (} ( I -I )
0 I - 0 Y - -I I
(As noted previously there are many equivalent definitions of viscosity so-
lution. In particular, the definition used in [43] is equivalent to the one used
here.)
Thus, in order to make use of the results of [43], we must describe condi-
tions on the components appearing in the statement of the control problem
(b(·),a(·) and k(·, ·))that are sufficient to guarantee (1.6)-(1.9). Property
1 is called "degenerate ellipticity" and follows easily from the definition
ofF(·,·,·,·) (it also is a consequence of property 3, as remarked in [30]).
Properties 2 and 4 are rather simple consequences of (Al.l). Property 3 is
the most difficult to verify, and, in particular, requires the Lipschitz conti-
nuity conditions given in (ALl). We omit the proof, and instead refer the
reader to [30].
The last issue to be resolved is whether or not V(·) as defined by (1.1)
is indeed a solution to (1.2). It turns out that the main difficulty here is in
verifying that V (·) satisfies the dynamic programming principle: for ~ > 0
and x E G,
If we assume that V(·) is continuous and that (1.10) holds, then minor
modifications of the proof of Theorem 3.1 in [58, Chapter 5] show that V (·)
is indeed a solution of {1.2).
Thus, all that needs to be verified is (1.10). For our particular problem the
process is never stopped, and the dynamic programming equation follows
from the Markov property and the definition of V(·).
lim sup Sh(yh, rjJ(yh) + oh, r/J(·) + oh) ~ F*(x, rjJ(x), r/Jx(x), rPxx(x))
h--+0
and
and
for x E C~; there are t: 1 > 0, c1 > 0 and c2(h)---+ 0 ash---+ 0 such that
E~;::Lle~ E {lh+o(h): c2(h) 2:0 2: c1h,1 E r(x)}, (2.4)
where the qh(x, yia) depend on b(·, ·) and a{·). From {A2.1), we deduce
that qh(x, yia) ;::: 0. Define Qh(x, a) = Ly qh(x, yia). If {2.8) is to be
meaningful, then we would expect Qh(x, a) i= 0. By defining ~th(x, a) =
1/Qh(x, a) and ph(x, yia) = qh(x, yia)/Qh(x, a), we put (2.8) into the form
of {2.7). Note that the ~(x, yia) so defined are the transition probabilities
of a controlled Markov chain.
Owing to the presence of the supremum operation in (2. 7) it is difficult
to directly draw a conclusion on the properties that the ph(x,yia) must
satisfy. However, by choosing the linear function ¢(x) = p'x, we see that
(A2.3) implies
lim sup
h--+0 aEU
[~t h(x,1 a ) LPh(x, yia)(y- x)'p- k(x, a)]
Y (2.9)
= sup [-b(x, a)'p- k(x, a)].
aEU
[35] J. G. Dai and J. M. Harrison. The QNET method for two moment
analysis of closed manufacturing systems. Ann. Appl. Probab., 3:968-
1012, 1993.
[36] J. G. Dai, D. H. Yeh, and C. Zhou. The QNET method for reentrant
queueing networks with priority disciplines. Operations Research,
45:610-623, 1997.
[43] P. Dupuis and H. Ishii. On oblique derivative problems for fully non-
linear second-order elliptic PDE's on domains with corners. Hokkaido
Math J., 20:135-164, 1991.
[61] S. J. Gass. Linear Progmmming. McGraw Hill, New York, NY, fifth
edition, 1985.
[80] R. Jensen. The maximum principle for viscosity solutions of fully non-
linear second order partial differential equations. Arch. Rat. Mech.
Anal., 101:1-27, 1988.
(115] P.-L. Lions and A.-S. Sznitman. Stochastic differential equations with
reflecting boundary conditions. Comm. Pure Appl. Math., 37:511-
553, 1984.