100% found this document useful (1 vote)
870 views480 pages

Numerical Methods For Stochastic Control Problems in Continuous Time (PDFDrive)

Uploaded by

Steve Demirel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
870 views480 pages

Numerical Methods For Stochastic Control Problems in Continuous Time (PDFDrive)

Uploaded by

Steve Demirel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 480

Stochastic Mechanics Applications of

Random Media Mathematics


Signal Processing and Image Synthesis
Stochastic Modelling
Mathematical Economics and Finance and Applied Probability
Stochastic Optimization

24
Stochastic Control

Edited by I. Karatzas
M.Yor

Advisory Board P. Bremaud


E. Carlen
W. Fleming
D. Geman
G. Grimmett
G. Papanicolaou
J. Scheinkman

Springer Science+Business Media, LLC


Applications of Mathematics

I Fleming!Rishel, Deterministic and Stochastic Optimal Control (1975)


2 Marchuk, Methods of Numerical Mathematics, Second Ed. (1982)
3 Balakrishnan, Applied Functional Analysis, Second Ed. (1981)
4 Borovkov, Stochastic Processes in Queueing Theory (1976)
5 Liptser/Shiryayev, Statistics of Random Processes 1: General Theory (1977)
6 Liptser/Shiryayev, Statistics of Random Processes II: Applications (1978)
7 Vorob'ev, Game Theory: Lectures for Economists and Systems Scientists
(1977)
8 Shiryayev, Optimal Stopping Rules (1978)
9 lbragimov/Rozanov, Gaussian Random Processes (1978)
10 Wonham, Linear Multivariable Control: A Geometric Approach, Third Ed.
(1985)
II Hida, Brownian Motion (1980)
12 Hestenes, Conjugate Direction Methods in Optimization (1980)
13 Kallianpur, Stochastic Filtering Theory (1980)
14 Krylov, Controlled Diffusion Processes (1980)
15 Prabhu, Stochastic Storage Processes: Queues, Insurance Risk, Dams,
and Data Communication, Second Ed. (1998)
16 lbragimov/Has'minskii, Statistical Estimation: Asymptotic Theory (1981)
17 Cesari, Optimization: Theory and Applications (1982)
18 Elliott, Stochastic Calculus and Applications (1982)
19 Marchuk/Shaidourov, Difference Methods and Their Extrapolations (1983)
20 Hijab, Stabilization of Control Systems (1986)
21 Protter, Stochastic Integration and Differential Equations (1990)
22 Benveniste/Metivier/Priouret, Adaptive Algorithms and Stochastic
Approximations (1990)
23 Kloeden!Platen, Numerical Solution of Stochastic Differential Equations
(1992)
24 Kushner/Dupuis, Numerical Methods for Stochastic Control Problems
in Continuous Time, Second Ed. (2001)
25 Fleming!Soner, Controlled Markov Processes and Viscosity Solutions
(1993)
26 Baccelli/Bn5maud, Elements of Queueing Theory ( 1994)
27 Winkler, Image Analysis, Random Fields, and Dynamic Monte Carlo
Methods: An Introduction to Mathematical Aspects (1994)
28 Kalpazidou, Cycle Representations of Markov Processes (1995)
29 Elliott/Aggoun!Moore, Hidden Markov Models: Estimation and Control
(1995)
30 Hemandez-Lerma!Lasserre, Discrete-Time Markov Control Processes:
Basic Optimality Criteria (1996)
31 Devroye/Gyorfi!Lugosi, A Probabilistic Theory of Pattern Recognition (1996)
32 Maitra/Sudderth, Discrete Gambling and Stochastic Games (1996)

(continued after index)


Harold J. Kushner Paul Dupuis

Numerical Methods for


Stochastic Control Problems
in Continuous Time
Second Edition

With 40 Figures

'Springer
Harold J. Kushner
Paul Dupuis
Division of Applied Mathematics
Brown University
Providence, RI 02912, USA

Managing Editors
I. Karatzas
Departrnents of Mathematics and Statistics
Columbia University
New York, NY 10027, USA

M. Yor
CNRS, Laboratoire de Probabilites
Universite Pierre et Marie Curie
4, Place Jussieu, Tour 56
F-75252 Paris Cedex 05, France

Mathematics Subject Classification (2000): 93-02, 65U05, 90C39, 93E20

Library of Congress Cataloging-in-Publication Data


Kushner, Harold J. (Harold Joseph), 1933-
Numerical methods for stochastic control problems in continuous time 1 Harold J.
Kushner, Paul Dupuis. - 2nd ed.
p. cm. - (Applications of mathematics ; 24)
Includes bibliographical references and index.

1. Stochastic control theory. 2. Markov processes. 3. Numerica! analysis. I. Dupuis,


Paul. II. Title. III. Series.
QA402.37.K87 2001
003.76-dc2l 00-061267

Printed on acid-free paper.

© 1992, 2001 Springer Science+Business Media New York


Originally published by Springer-Verlag New York Inc. in 2001
Softcover reprint ofthe hardcover 2nd edition 2001

Ali rights reserved. This work may not be translated or copied in whole or in part without the
written perrnission of the publisher (Springer Science+Business Media, LLC ), except for brief
excerpts in connection with reviews or scholar1y analysis. Use in connection with anyforrn
of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed is forbidden.

The use of general descriptive names, trade names, trademarks, etc., in this publication, even
if the forrner are not especially identified, is not to be taken as a sign that such names, as
understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely
by anyone.

Production managed by Alian Abrams; manufacturing supervised by Erica Bresler.


Photocomposed copy prepared from the authors' LaTeX files.

9 8 7 6 5 4 3 2 1

SPIN 10780458
ISBN 978-1-4612-6531-3 ISBN 978-1-4613-0007-6 (eBook)
DOI 10.1007/978-1-4613-0007-6
To

Linda, Diana, and Nina

and

Suzanne, Alexander, and Nicole


Contents

Introduction 1

1 Review of Continuous Time Models 7


1.1 Martingales and Martingale Inequalities 8
1.2 Stochastic Integration . . . . . . . . . . 9
1.3 Stochastic Differential Equations: Diffusions . 14
1.4 Reflected Diffusions . 21
1.5 Processes with Jumps . . 28

2 Controlled Markov Chains 35


2.1 Recursive Equations for the Cost . . . . . . . 36
2.1.1 Stopping on first exit from a given set 36
2.1.2 Discounted cost . . . . . . . . . . . 38
2.1.3 Average cost per unit time . . . . 40
2.1.4 Stopping at a given terminal time 41
2.2 Optimal Stopping Problems 42
2.2.1 Discounted cost .. 43
2.2.2 Undiscounted cost . 47
2.3 Discounted Cost . . . . . . 48
2.4 Control to a Target Set and Contraction Mappings 50
2.5 Finite Time Control Problems . . . . . . . . . . . . 52

3 Dynamic Programming Equations 53


3.1 Functionals of Uncontrolled Processes 54
viii Contents

3.1.1 Cost until a target set is reached 54


3.1.2 The discounted cost . . . . . . 56
3.1.3 A reflecting boundary . . . . . . 57
3.1.4 The average cost per unit time 58
3.1.5 The cost over a fixed finite time interval 59
3.1.6 A jump diffusion example . . . 59
3.2 The Optimal Stopping Problem . . . . . . . . . 60
3.3 Control Until a Target Set Is Reached . . . . . 61
3.4 A Discounted Problem with a Target Set and Reflection 65
3.5 Average Cost Per Unit Time . . . . . . . . . . . . . . . 65

4 Markov Chain Approximation Method: Introduction 67


4.1 Markov Chain Approximation . 69
4.2 Continuous Time Interpolation 72
4.3 A Markov Chain Interpolation 74
4.4 A Random Walk Approximation 78
4.5 A Deterministic Discounted Problem 80
4.6 Deterministic Relaxed Controls . . . 85

5 Construction of the Approximating Markov Chains 89


5.1 One Dimensional Examples . . . . . . . . . . . . . 91
5.2 Numerical Simplifications . . . . . . . . . . . . . . 99
5.2.1 Eliminating the control dependence in the
denominators of ph(x, yia) and t:..th(x, a) 99
5.2.2 A useful normalization if ph(x, xla) =f. 0 . . 100
5.2.3 Alternative Markov chain approximations for
Example 4 of Section 5.1: Splitting the operator 103
5.3 The General Finite Difference Method . . . . . . . . . . 106
5.3.1 The general case . . . . . . . . . . . . . . . . . . 108
5.3.2 A two dimensional example: Splitting the operators 112
5.4 A Direct Construction . . . . . . . . . . . . . . . . 113
5.4.1 An introductory example . . . . . . . . . . 114
5.4.2 Example 2. A degenerate covariance matrix 117
5.4.3 Example 3. . . . . 119
5.4.4 A general method 121
5.5 Variable Grids . . . . . . 122
5.6 Jump Diffusion Processes 127
5.6.1 The jump diffusion process model: Recapitulation 127
5.6.2 Constructing the approximating Markov chain 128
5.6.3 A convenient representation of{~~. n < oo}
and 'lj;h(·) . . . . . 131
5. 7 Reflecting Boundaries . . . . . . . . . . . . . . . . 132
5.7.1 General discussion . . . . . . . . . . . . . . 132
5.7.2 Locally consistent approximations on the boundary. 136
Contents ix

5.7.3 The continuous parameter Markov chain


interpolation . . . . . . . . . 138
5.7.4 Examples . . . . . . . . . . . 138
5.7.5 The reflected jump diffusion . 141
5.8 Dynamic Programming Equations . 141
5.8.1 Optimal stopping . . . . . . . 141
5.8.2 Control until exit from a compact set 144
5.8.3 Reflecting boundary . . . . . . . . 145
5.9 Controlled and State Dependent Variance .. 148

6 Computational Methods for Controlled Markov Chains 153


6.1 The Problem Formulation . . . . . . . 154
6.2 Classical Iterative Methods . . . . . . 156
6.2.1 Approximation in policy space 156
6.2.2 Approximation in value space . 158
6.2.3 Combined approximation in policy space and
approximation in value space . . . . . . . . . 160
6.2.4 The Gauss-Seidel method: Preferred orderings
of the states . . . . . 161
6.3 Error Bounds . . . . . . . . . . . . . 164
6.3.1 The Jacobi iteration . . . . . 164
6.3.2 The Gauss-Seidel procedure . 165
6.4 Accelerated Jacobi and Gauss-Seidel Methods . 166
6.4.1 The accelerated and weighted algorithms 166
6.4.2 Numerical comparisons between the basic and
accelerated procedures 168
6.4.3 Example . . . . . . . . . . 170
6.5 Domain Decomposition . . . . . 171
6.6 Coarse Grid-Fine Grid Solutions 174
6. 7 A Multigrid Method . . . . . . . 176
6.7.1 The smoothing properties of the
Gauss-Seidel iteration 176
6.7.2 A multigrid method 179
6.8 Linear Programming . . . . . 183
6.8.1 Linear programming . 183
6.8.2 The LP formulation of the Markov chain
control problem . . . . . . . . . . . . . . . 186

7 The Ergodic Cost Problem: Formulation and Algorithms 191


7.1 Formulation of the Control Problem 192
7.2 A Jacobi Type Iteration . . . . 196
7.3 Approximation in Policy Space 197
7.4 Numerical Methods . . . . 199
7.5 The Control Problem. . . 201
7.6 The Interpolated Process 206
x Contents

7.7 Computations . . . . . . . . . . . . . . . . . . . . . . . . 207


7.7.1 Constant interpolation intervals. . . . . . . . . . 207
7.7.2 The equation for the cost (5.3) in centered form 209
7.8 Boundary Costs and Controls . . . . . . . . . . . . . . . 213

8 Heavy Traffic and Singular Control 215


8.1 Motivating Examples . . . . . . . . . . . . . . . . . 216
8.1.1 Example 1. A simple queueing problem . . 216
8.1.2 Example 2. A heuristic limit for Example 1 217
8.1.3 Example 3. Control of admission, a singular
control problem . . . . . . . . . . . . . . . . . 221
8.1.4 Example 4. A multidimensional queueing or produc-
tion system under heavy traffic: No control . . . . . 223
8.1.5 Example 5. A production system in heavy traffic with
impulsive control . . . . . . . . . . . . . . . . . . . . 228
8.1.6 Example 6. A two dimensional routing
control problem . . . 229
8.1.7 Example 7 . . . . . . 233
8.2 The Heavy Traffic Problem 234
8.2.1 The basic model .. 234
8.2.2 The numerical method . 236
8.3 Singular Control . . . . . . . . 240

9 Weak Convergence and the Characterization


of Processes 245
9.1 Weak Convergence . . . . . . . . . . . . . . 246
9.1.1 Definitions and motivation . . . . . 246
9.1.2 Basic theorems of weak convergence 247
9.2 Criteria for Tightness in Dk [0, oo) 250
9.3 Characterization of Processes 251
9.4 An Example . . . 253
9.5 Relaxed Controls . 262

10 Convergence Proofs 267


10.1 Limit Theorems .. 268
10.1.1 Limit of a sequence of controlled diffusions .. 268
10.1.2 An approximation theorem for relaxed controls 275
10.2 Existence of an Optimal Control . . 276
10.3 Approximating the Optimal Control . . . . . . . . . 282
10.4 The Approximating Markov Chain . . . . . . . . . . 286
10.4.1 Approximations and representations for 'lj;h(·) 287
10.4.2 The convergence theorem for the interpolated chains 290
10.5 Convergence of the Costs 291
10.6 Optimal Stopping . . . . . . . . . . . . . . . . . . . . . . . . 296
Contents xi

11 Convergence for Reflecting Boundaries, Singular


Control, and Ergodic Cost Problems 301
11.1 The Reflecting Boundary Problem . . . . . . . . . . . . . . 302
11.1.1 The system model and Markov chain approximation 302
11.1.2 Weak convergence of the approximating processes 306
11.2 The Singular Control Problem 315
11.3 The Ergodic Cost Problem . . . . . . . . . . . . . . . . . 320

12 Finite Time Problems and Nonlinear Filtering 325


12.1 Explicit Approximations: An Example 326
12.2 General Explicit Approximations . . . 330
12.3 Implicit Approximations: An Example 331
12.4 General Implicit Approximations 333
12.5 Optimal Control Computations . 335
12.6 Solution Methods . . . . . . . . ·. 337
12.7 Nonlinear Filtering . . . . . . . . 340
12.7.1 Approximation to the solution of the
Fokker-Planck equation . . . . . . . . 340
12. 7.2 The nonlinear filtering problem: Introduction
and representation . . . . . . . . . . . . . . . 341
12.7.3 The approximation to the optimal filter for x(·), y(·) 345

13 Controlled Variance and Jumps 347


13.1 Controlled Variance: Introduction . 348
13.1.1 Introduction . . . . 348
13.1.2 Martingale measures 351
13.1.3 Convergence 354
13.2 Controlled Jumps . . . . . . 357
13.2.1 Introduction . . . . 357
13.2.2 The relaxed Poisson measure 361
13.2.3 Optimal controls . . . . . . . 364
13.2.4 Convergence of the numerical algorithm 365

14 Problems from the Calculus of Variations:


Finite Time Horizon 367
14.1 Problems with a Continuous Running Cost 368
14.2 Numerical Schemes and Convergence . . . . 371
14.2.1 Descriptions of the numerical schemes 372
14.2.2 Approximations and properties of the value function 373
14.2.3 Convergence theorems . . . . . . . . . . . . 378
14.3 Problems with a Discontinuous Running Cost . . . 384
14.3.1 Definition and interpretation of the cost on
the interface . . . . . . . . . . . . . . . . . 386
14.3.2 Numerical schemes and the proof of convergence 388
xii Contents

15 Problems from the Calculus of Variations:


Infinite Time Horizon 401
15.1 Problems of Interest . . . . . . . . . . . . . 403
15.2 Numerical Schemes for the Case k(x,a) ~ ko > 0 404
15.2.1 The general approximation . . . . . . . . 404
15.2.2 Problems with quadratic cost in the control 405
15.3 Numerical Schemes for the Case k(x, a) ~ 0 409
15.3.1 The general approximation . . 410
15.3.2 Proof of convergence . . . . . . . . . 411
15.3.3 A shape from shading example . . . 422
15.4 Remarks on Implementation and Examples 435

16 The Viscosity Solution Approach 443


16.1 Definitions and Some Properties of Viscosity Solutions 444
16.2 Numerical Schemes . . 449
16.3 Proof of Convergence . . . . . . . . . . . . . . . . . . . 453

References 455

Index 467

List of Symbols 4 73
Introduction

Changes in the second edition. The second edition differs from the first
in that there is a full development of problems where the variance of the
diffusion term and the jump distribution can be controlled. Also, a great
deal of new material concerning deterministic problems has been added,
including very efficient algorithms for a class of problems of wide current
interest.

This book is concerned with numerical methods for stochastic control


and optimal stochastic control problems. The random process models of
the controlled or uncontrolled stochastic systems are either diffusions or
jump diffusions. Stochastic control is a very active area of research and new
problem formulations and sometimes surprising applications appear regu-
larly. We have chosen forms of the models which cover the great bulk of the
formulations of the continuous time stochastic control problems which have
appeared to date. The standard formats are covered, but much emphasis is
given to the newer and less well known formulations. The controlled process
might be either stopped or absorbed on leaving a constraint set or upon
first hitting a target set, or it might be reflected or "projected" from the
boundary of a constraining set. In some of the more recent applications
of the reflecting boundary problem, for example the so-called heavy traffic
approximation problems, the directions of reflection are actually discontin-
uous. In general, the control might be representable as a bounded function
or it might be of the so-called impulsive or singular control types. Both the
"drift" and the ''variance" might be controlled. The cost functions might
2 Introduction

be any of the standard types: Discounted, stopped on first exit from a set,
finite time, optimal stopping, average cost per unit time over the infinite
time interval, and so forth. There might be separate costs when the process
is on the boundary and when it is in the interior of the set of interest. In
fact all of the standard cost functionals can be dealt with by the meth-
ods to be presented. There is a close connection between approximation
methods for stochastic control and those for optimal nonlinear filtering,
and approximation methods for the latter problem are also discussed.
The class of methods to be dealt with is referred to generically as the
Markov chain approximation method. It is a powerful and widely usable
set of ideas for numerical and other approximation problems for either
controlled or uncontrolled stochastic processes, and it will be shown that
it has important applications to deterministic problems as well. The ini-
tial development of the approximation method and the convergence proofs
appeared in the first author's 1977 book. Since that time new classes of
problems have arisen to which the original proofs could not be applied di-
rectly, the techniques of approximation and mathematical proof have been
.considerably streamlined, and also extended to cover a large part of the
new problems of interest in continuous time stochastic control. In addition,
many new techniques for actually doing the computations have been de-
veloped. The present book is a revision and updating of the 1992 edition.
There is much new material on the problems of jump and variance control
and on deterministic problems as well, with possible discontinuities in the
data.
The basic idea of the Markov chain approximation method is to approx-
imate the original controlled process by an appropriate controlled Markov
chain on a finite state space. One also needs to approximate the original cost
function by one which is appropriate for the approximating chain. These
approximations should be chosen such that a good numerical approxima-
tion to the associated control or optimal control problem can be obtained
with a reasonable amount of computation. The criterion which must be
satisfied by the process approximation is quite mild. It is essentially what
we will call "local consistency." Loosely speaking, this means that from a
local point of view, the conditional mean and covariance of the changes in
state of the chain are proportional to the local mean drift and covariance
for the original process. Such approximations are readily obtained by a va-
riety of methods and are discussed extensively in Chapters 4 and 5. The
numerical problem is then to solve the problem for the approximating con-
trolled chain. Methods for doing this are covered in detail in Chapters 6 to
8, with the basic concepts being in Chapter 6. One needs to prove that the
solutions to the problems with the approximating chain actually converge
to the correct value as some approximation parameter goes to zero. One
of the great advantages of the approach is that this can often be done by
probabilistic methods which do not require the use of any of the analytical
properties of the actual solution. This is particularly important since for
Introduction 3

many classes of problems, little is known about the analytical properties of


the Bellman equations.
The book is written on two levels, so that the methods of actual ap-
proximation and practical use of the algorithms can be read without any
involvement with the mathematics of the convergence proofs. An effort is
made to motivate the development of the algorithms in terms of the prop-
erties of the original process of interest, but this is on a purely intuitive
level, and the various properties which are used should be intuitively nat-
ural. Thus the book should be accessible to a reader with only a formal
acquaintance with the properties of diffusion and jump diffusion processes.
Indeed, one of the primary purposes of the book is the encouragement of
the use and the development of the actual algorithms on a wide variety of
practical problems. We note that the methods are not restricted to optimal
control problems. They can be used for the calculation of approximations
to a large class of functionals of processes of the diffusion or jump diffusion
type. The probabilistic nature of the process of approximation and proof
allows us to use our physical intuition concerning the original problem in
all phases of the development. The reader should keep in mind that we
approximate the control problem and not the formal dynamical equation
for the optimal value function.
The proofs of convergence for the Markov chain approximation method
are purely probabilistic. One finds a suitable continuous time interpolation
of the optimally controlled approximating chain, and shows that in the
sense of weak or distributional convergence, there is a convergent subse-
quence whose limit is an optimally controlled process of the original diffu-
sion or jump diffusion type, and with the original cost function and bound-
ary data. The methods of proof are from the theory of weak convergence
of probability measures. The weak convergence method provides a unifying
approach for all the problems of interest.
We also note that the deterministic problem is a special case of what is
of interest here. All of the algorithms and approximations can be used for
deterministic problems where one wants a feedback control. The basic fact
which is needed for the proofs is the local consistency referred to above. In
fact, in Chapters 14 and 15, we see how it can be used for deterministic
problems which arise in the calculus of variations, nonlinear Hamilton-
Jacobi equations, and elsewhere, with possibly discontinuous data.
From a formal point of view, as is well known and also discussed in
Chapter 3, the optimal value functions for many stochastic control prob-
lems with a diffusion process model can be represented as the solution to
either a highly nonlinear and possibly degenerate partial differential equa-
tion of the elliptic or parabolic type with appropriate boundary data, or
possibly as a variational inequality with similar degeneracies and boundary
data. If the underlying process is of the controlled jump diffusion type then
the partial differential equation will be replaced by a nonlinear partial inte-
grodifferential equation. Because of this formal "PDE-type" structure, the
4 Introduction

current literature in numerical analysis offers many useful ideas concerning


both the approximation of the original problem and the solution of the ap-
proximating problem, at least in the simpler cases. This influence is partic-
ularly strong at the level of the actual computational algorithms, as might
be seen by scanning Chapter 6. But, except for a few special situations, the
current literature is not adequate to deal with the convergence proofs, and
of course the convergence proofs are the ultimate guide for the determina-
tion of the actual approximations which are to be used for the computation.
This is particularly true for the cases where the formal dynamical equa-
tion for the optimal value function does not fit the classical PDE models
with which the numerical analysis literature has concerned itself. These
new cases would include problems involving discontinuous boundary data,
as occurs in the so-called heavy traffic problem; problems with singular
controls, controlled variance, ergodic problems, or generally for degenerate
models. By degenerate, we mean that the covariance matrix of the diffusion
part of the model is not strictly positive definite, and we note that this is
a quite common case in applications. One might sometimes use the formal
PDE or variational inequality as a guide to get useful approximations in
certain cases, but the ultimate proofs do not use these structures or any of
their properties.
Chapter 2 contains an outline of results in the control of Markov chains
on a finite state space which will be needed later in the book. The ap-
proximations are developed and the numerical algorithms are explained in
Chapters 4 to 8 and in the first part of Chapter 12. The mathematical
proofs are in Chapters 10-15 with many of the basic ideas being in Chapter
10. Chapters 1 and 9 contain a survey of the mathematical results which
will be needed in Chapters 10-15. Chapter 3 contains a formal discussion of
the continuous time control problem. It introduces some of the basic models
for the cost function and illustrates the use of the principal of optimality to
formally get the nonlinear PDEs which are satisfied by the optimal value
function. The chapter is for motivational purposes only, since the results are
formal and are not explicitly used except for further motivation. But hav-
ing the formal PDEs is helpful when discussing the approximations, since
it can be used to show us that we are on familiar ground. Chapters 2 and
4 to 8 can be read independently of Chapters 1, 3 and 9-16. In particular,
the methods for getting the approximations and numerical algorithms in
Chapters 4-6, and in the relevant "algorithmic" parts of Chapters 7, 8 and
12-15 can be read without familiarity with the more theoretical chapters.
The Markov chain approximation method is introduced in Chapter 4.
This is done via some simple problems, but all of the main issues which
will need to be considered are introduced. A deterministic example is de-
veloped in detail, since that allows us to demonstrate the ideas with the
least encumbrance. To show us that the intuition of numerical analysis
remains useful, the role of finite difference approximations in getting the
approximating chain is emphasized.
Introduction 5

Chapter 5 contains an extensive exposition of methods for getting ap-


propriate approximating chains. It will be seen that this is essentially an
exercise in common sense, and that we have considerable freedom. The
methods which are discussed are illustrative of essentially automatic and
easy to use procedures, but they also illustrate the general principles which
one ought to follow in the development of extensions. There is also a discus-
sion of the problem of numerical noise when the variance is also controlled.
Chapter 6 develops the basic computational methods which are used to
solve the approximating optimal control problems. Many of the standard
approaches to the solution of large systems of linear equations appear, since
the standard approximation in policy space method leads to the solution of
a sequence of such linear problems. In Chapter 7, we extend the discussion
of Chapter 6 for the average cost per unit time problem. Chapter 8 deals
with the so-called heavy traffic and singular control problems. These classes
of problems are rather recent arrivals on the stochastic control scene, but
they model some very important applications, and the numerical analysis
questions are particularly interesting. Due to this, we discuss a number of
specific examples. Then the ideas of Chapters 5 and 6 are extended. Chap-
ter 12 is concerned with problems which are of interest over a finite time
interval only, as well as with the approximation problem for the optimal
nonlinear filter. In the course of development of the techniques for the nu-
merical problem, various side results appear which have a broader use in
stochastic process theory; for example, the simple proof of existence of a
solution to the stochastic differential equation with a reflecting boundary
in Chapter 11.
Chapter 13 is new and gives the convergence proofs for the variance
and jump control problems. The controlled variance problem arises in re-
cent applications to mathematical finance and the controlled jump problem
arises in applications in telecommunications. The proper treatment of these
problems requires that the basic models be extended by generalizing the
concepts of Wiener process and Poisson measure. Chapters 14 and 15 con-
sider a variety of deterministic problems that are not special cases of the
stochastic problems dealt with previously. Chapter 14 focuses on finite time
problems. It shows how to deal with problems where the control space is
unbounded and the difficult problem of deterministic dynamics with dis-
continuous cost. Chapter 15 considers problems over potentially infinite
time intervals. These problems occur in robust nonlinear control, large de-
viations, computer vision, and related problems involving the evolution
of surfaces and boundaries. An important class of problems considered are
those where the dynamics are linear in the control and the cost is quadratic
in the control, and very efficient approximations are developed in detail for
these problems. In Chapter 16 briefly discusses the "viscosity solution"
approach, which provides an alternative method of proof in some cases.
The methods developed in this book allow us to do effective computations
on a wide variety of problems. Nevertheless, the entire field of numerical
6 Introduction

stochastic control is in its infancy and much more effort is needed on all
phases of the area.
It is a pleasure to acknowledge the considerable help of Luiz Felipe Mar-
tins, John Oliensis, Felisa Vasquez-Abad and Jichuan Yang in the prepa-
ration of the first edition. We also thank John Oliensis for the figures in
Chapter 15. This work has been supported for many years by the National
Science Foundation and the Army Research Office.
1
Review of Continuous Time Models

In this book we will consider methods for numerically computing the value
function for certain classes of controlled continuous time stochastic and
deterministic processes. The purpose of the present chapter is to provide
an introduction to and some of the background material for controlled
diffusions and controlled jump diffusions. These types of processes include
many of the models that are commonly used. This section is only intended
to serve as a review of the main ideas and for purposes of reference. Other
models (e.g., singularly controlled diffusions) that are also of interest will
be introduced and elaborated on in the appropriate later sections of the
book.
Our main interest in the present chapter is in constructing and estab-
lishing certain properties of the processes. Chapter 9 will also deal with
important background material, such as alternative characterizations of
the processes and the theory of weak convergence. Section 1.1 presents the
definitions and fundamental inequalities of martingales. In Section 1.2, we
review integration with respect to the Wiener process and state the associ-
ated chain rule (Ito's formula). With the definition of the stochastic integral
and the appropriate martingale estimates in hand, in Sections 1.3 and 1.4
we define what is meant by a solution of a stochastic differential equation
and outline the proof of existence of solutions. The processes defined by the
solutions of these stochastic differential equations will serve as our models
of controlled continuous time processes with continuous sample paths. We
also discuss the notion of uniqueness of solutions that will be suitable for
our later work. We will first consider processes without control, and then
indicate the extension to the controlled case. For purposes of numerical
8 1. Review of Continuous Time Models

computation, it is usually necessary that the processes be constrained to


take values in some bounded set G. Thus, the original problem may have
to be modified to reflect this need. There are at least two considerations
that must be kept in mind when redefining the problem statement. The
most obvious requirement is that quantities one is interested in estimating
(e.g., certain expectations) should not be greatly perturbed. The second
requirement is that the resulting modification should be convenient with
respect to computations. One way to achieve this bounding of the state
space is to stop the process the first time it leaves some large but bounded
domain. At this time, a cost should be assigned that is approximately equal
to the total of the future costs if the process were not stopped. The de-
termination of an appropriate stopping cost can be difficult in practice. A
second method involves constraining the process without actually stopping
it. For diffusions, such a bounding of the state space can be achieved by
imposing a reflecting boundary condition on the boundary of G. For pro-
cesses involving jumps, one may simply project the process back into Gin
some convenient way whenever it leaves that set. Besides being useful for
numerical purposes, such constrained or reflected processes are important
as basic models for many problems in stochastic systems theory. Exam-
ples and references are given Chapter 8. A unified and natural method
for constructing and analyzing such constrained processes can be based on
use of the Skorokhod Problem (SP). In Section 1.4, we define a solution
to the Skorokhod Problem in a setting general enough to cover all of the
situations involving reflecting diffusions and projected processes that will
appear in the book. We consider several illustrative examples, including a
case that is convenient for the purpose of simply constraining a process to
a bounded set. We conclude the chapter in Section 1.5 with a discussion of
the analogous results for jump diffusions.

1.1 Martingales and Martingale Inequalities


Consider a probability space (n, F, P). A family of a-algebras {Ft, t ~ 0}
is called a filtmtion on this probability space if F 8 C Ft C :F for all 0 ~
s ~ t. Let E:Ft and P:Ft denote expectation and probability conditioned
on the a-algebra :Ft, respectively. Suppose Ck [0, T] denotes the space of
continuous functions mapping (0, T] into JRk and that Dk [0, T] consists of
those functions from [0, T] into JRk that are continuous from the right and
have limits from the left. Let Ck [0, oo) and Dk [0, oo) denote the analogous
path spaces for the interval [O,oo). These spaces may be metrized so they
are complete separable metric spaces [13]. We will drop k from the notation
when k = 1.
Consider a stochastic process x(·) defined on (f!, F, P) and taking values
in the path space D[O, oo). Then x(-) is said to be an :Ft-mariingale if x(t)
1.2 Stochastic Integration 9

is .1"t-measurable, Elx(t)i < oo for all t ~ 0, and if

E:Ftx(t + s) = x(t) w.p.l for all t ~ 0 and all s ~ 0. {1.1)

If the particular filtration is obvious or unimportant, then we will suppress


the prefix and refer to x( ·) simply as a martingale. We will refer to a vector
valued process as a vector valued martingale if each of its components is a
martingale with respect to the same filtration.
The importance of martingales is in part due to the bounds and inequal-
ities associated with them. Processes can often be decomposed as a sum of
a bounded variation term plus a martingale (with respect to some conve-
nient filtration). When performing calculations involving the process (e.g.,
obtaining bounds on moments), the bounded variation term is often easy
to handle. Thus, the decomposition is useful since estimates such as those
presented below can be used to bound the martingale part of the process.
The inequalities that we will find most useful in this book are the following.
Let x(·) be an .1"t-martingale that has right continuous sample paths. (All
of the martingales encountered in this book will be of this type.) Then, for
any c > 0, T ~ 0 and 0 ~ t ~ T,

See [83].
A random variable T : n -+ [0, oo] is called an .1"t-stopping time if
{r ~ t} E .1"t for all t E [O,oo). If x(·) is an .1"t-martingale and Tis a
uniformly bounded .1"t-stopping time, then the stopped process x(t 1\ r) is
also an .1"t-martingale. Thus, (1.2) and (1.3) also hold if we replace T by
T 1\ T, where T is any .1"t -stopping time.
If there exists a nondecreasing sequence {Tn, n = 1, 2, ... } of .1"t-stopping
times such that Tn-+ oo w.p.l and such that for each n the stopped process
x(t 1\ Tn) is a martingale, then x(·) is called an .1"t-local martingale.

1.2 Stochastic Integration


We are concerned in this chapter with reviewing the construction and var-
ious properties of some of the standard models used for continuous time
problems. One of the most important models for stochastic systems is the
stochastic differential equation (SDE) of the form

x(t) = x(O) +lot b(x(s))ds +lot a(x(s))dw(s).


10 1. Review of Continuous Time Models

Here x( ·) is an JRk -valued process with continuous sample paths, w( ·) is


an 1Rn-valued process which serves as a "driving noise," and b(·) and o{)
are vector and matrix valued functions of appropriate dimensions. Such
equations produce a rich class of models all defined in terms of a relatively

J:
simple model for the driving noise. The only quantity needing explanation
in the expression for x(·) is the term a(x(s))dw(s), to which we now
turn.
We will consider stochastic integrals with respect to two basic processes.
The first process is the Wiener process. As is well known, the resulting
stochastic integral and related theory of stochastic differential equations
(due to K. Ito) provide a very convenient family of models that are Marko-
vian and possess continuous sample paths. In the beginning of the section,
we define the Wiener process and recall some basic properties. We then re-
view Ito's definition of integration with respect to the Wiener process and
state the chain rule. In order to model processes involving jumps, we will
make use of Poisson random measures as a driving term. The associated
stochastic integral is, in a certain sense, easier to define than for the case of
the Wiener process. This integral will be defined and the combined chain
rule for both types of driving noise will be given in Section 1.5. If A is a col-
lection of random variables defined on a probability space (0, :F, P), then
we use :F(A) to denote the a-algebra generated by A. If Sis a topological
space, then B(S) is used to denote the a-algebra of Borel subsets of S.

Wiener Process. Let (0, :F, P) be a probability space and let {:Ft, t ~ 0}
be a filtration defined on it. A process {w(t), t ~ 0} is called an :Ft- Wiener
process if it satisfies the following conditions.
1. w(O) = 0 w.p.l.

2. w(t) is .1t-measurable and :F(w(s)- w(t): s ~ t) is independent of


:Ft for all t ~ 0.
3. The increments w(s) - w(t) are normally distributed with mean 0
and variance a 2 (s- t) > 0 for all s > t ~ 0.

4. The sample paths of w(·) are in C[O,oo).


For several constructions as well as an account of the detailed properties of
the Wiener process we refer the reader to the book of Karatzas and Shreve
[83]. If a = 1, then the process w( ·) is called a standard :Ft- Wiener process.
If :Ft is simply :F(w(s): 0 ~ s ~ t), then the :Ft prefix is often suppressed
and we refer to w( ·) simply as a Wiener process. A finite collection of mutu-
ally independent :Ft- Wiener processes is called a vector valued :Ft- Wiener
process.
A very important property of any :Ft-Wiener process is that it is also an
:Ft-martingale. This property follows from parts (2) and (3) of the defini-
tion. Indeed, the Wiener process is the canonical example of a continuous
1.2 Stochastic Integration 11

sample path process that is both Markovian and a martingale. The fact that
it has continuous sample paths and is also a martingale imply the sample
paths of w( ·) are of unbounded variation over any nontrivial time interval
(w.p.l). This excludes defining Ja(t)dw(t) by any pathwise construction if
we wish to allow a large class of integrands. Nonetheless, a useful integral
may be defined in a straightforward manner if we properly restrict the class
of allowed integrands. We will impose conditions on the integrand which
will imply that the resulting integral is an Ft-martingale when considered
as a function of the upper limit of integration. Thus, we will be able to
use the martingale estimates in the construction and applications of the
integral.

Remark. In this book we will always assume the coefficients in the equa-
tions are bounded. This is not much of a restriction for our purposes because
the state spaces of the processes will be bounded for numerical purposes.
Because of this boundedness our definitions are somewhat simpler than is
typical in thorough treatments of the theory of SDE.

Assumptions on the Integrand. A random process !(·) is said to be


Ft- adapted iff (t) is Ft- measurable for each t ? 0. If w (·) is an Ft- Wiener
process and f(-) is Ft-adapted, then f(·) is said to be nonanticipative
with respect tow(·), since f(u) and w(s)- w(t) are independent whenever
0::; u::; t::; s. A process f(t,w) is called measumble if {(t,w) : f(t,w) E
A} belongs to the product a-algebra .13([0, oo)) x .1", where .13([0, oo)) is
the a-algebra of Borel subsets of [0, oo ). Let Eb(T) denote the set of
Ft-adapted, measurable, real valued processes a(·) which are uniformly
bounded in t E [0, TJ and w E n, and let Eb denote those processes defined
on [0, oo) that are in Eb(T) for each T < oo. We say that a random pro-
cess is a simple function if there exists a sequence of deterministic times
{ti,i = 0,1, ... } such that 0 =to< h < · · · < ti-+ oo, and such that
a(t) = a(ti) for t E [ti, ti+ 1 ). The set of all simple functions in Eb will be
denoted by Eb.

We are now in the position to define the integral with respect to a Wiener
process. For full details concerning the arguments used below, the reader
may consult the book of Karatzas and Shreve [83]. The integral is defined
for an arbitrary integrand in Eb via an approximation argument. In general,
the stochastic integral defined below will be unique only in the sense that
any two versions will have sample paths that agree with probability one. We
will follow the usual convention of identifying any process with the class
of processes whose sample paths are identical with probability ~me and,
therefore, omit the corresponding qualification in the arguments below.

J:
Definition and Elementary Properties of udw when u E :Eb. Let
w(·) be a standard Ft-Wiener process and let a E Eb be given. Because
12 1. Review of Continuous Time Models

the sample paths of a are piecewise constant, it is possible to define I adw


(referred to hereafter as the integral of a) in a simple way. Let 0 = to <
t 1 < · · · < ti --too be the partition associated to a. Then we set

1O
t
u(u)dw(u) =
n-l
L u(ti) [w(ti+l)- w(ti)] + u(tn) [w(t)- w(tn)]
i=O

for t E [tn, tn+l)· It can be shown [83, Proposition 3.2.6] that for each
u E Eb there exist {an, n E IN} C Eb" such that for each T E [0, oo),

(2.1)

as n --too. It is then natural to define I~ a(u)dw(u) as the limit (in some


suitable sense) of the processes I~ un(u)dw(u). Full details of such a pro-
gram are in [83, Section 3.2]. Important properties of the resulting integral
are that for any 0 ~ s < t < oo, any u,u 1 , and u 2 in Eb", we have (w.p.l)

E;:8 1t u(u)dw(u) = 1 8
u(u)dw(u), (2.2)

E;:8 [1t a(u)dw(u)r = 1t E;:8 [u(u)] 2 du, (2.3)

1t u1(u)dw(u) + 1t u2(u)dw(u) = 1t [u1 + u2] (u)dw(u). (2.4)

These properties follow easily from the definition of the integral for u E Eb"
and are extended to integrands in Eb by approximation.
Given an open subset U of some Euclidean space, we let Ck(U) denote
the set of all real valued functions on U that have continuous derivatives
up to and including order k.

Ito's Formula. Let f E C 1 (JR) and let x(t) = x(O)+ I~ b(s)ds. The change
of variable formula for the composed function f(x(t)) is of course

f(x(t))- f(x(O)) =lot fx(x(s))b(s)ds,


which we can write symbolically as

df(x(t)) = fx(x(t))dx(t) = fx(x(t))b(t)dt.

The analogous formula for functions of the form h(t) = I~ u(s)dw(s), as


well as variations, plays an equally important role in stochastic analysis.
With the definition of the stochastic integral given above, it turns out
1.2 Stochastic Integration 13

that the change of variable formula takes a slightly more cumbersome form
and is valid under more restrictive conditions than in the classical case.
Nonetheless, it is still an extraordinarily powerful tool. Consider the more
general form

x(t) = x(O) +lot b(s)ds +lot a(s)dw(s). (2.5)

We also use the notation

dx(s) = b(s)ds + a(s)dw(s)

to express the relationship (2.5). We will not state Ito's formula under very
general conditions, but only in the form needed later in the book. Thus,
we assume b(·) and a(·) are in Eb. Then Ito's formula states that for any
f E C 2 (./R),

t
f(x(t))- f(x(O)) = Jo fx(x(s))dx(s)
1
+ 2 Jo
[t
fxx(x(s))a 2 (s)ds,

where

1t fx(x(s))dx(s) 1t fx(x(s))b(s)ds + 1t fx(x(s))a(s)dw(s).


=

We can write this relationship symbolically as

df(x(t)) = [!x(x(t))b(t) + ~fxx(x(t))a 2 (t)] dt + fx(x(t))a(t)dw(t).


The Vector Cases of Stochastic Integration and Ito's Formula.
All vectors are assumed to be column vectors and we use a prime as a
superscript to denote transpose. Let w(·) be an n-dimensional standard
vector valued Ft-Wiener process. Let a(·) be a process taking values in
the space of n x k real matrices with the property that each entry aii(·)
is in Eb for (i,j) E {1, ... ,n} x {1, ... ,k}. We then define the integral of
a(·) with respect to w( ·) to be the vector valued random process h( ·) =
(h1(·), ... , hk(·))' given by

hi(t) = t 1t
j=l 0
aii(s)dwi(s)

for i E {1, ... , k }. Suppose that b( ·) = (b1 (·), ... , bk( ·))' and that bi( ·) E Eb
for each i E {1, ... , k }. Define

x(t) = x(O) + 1t b(s)ds + 1t a(s)dw(s). (2.6)


14 1. Review of Continuous Time Models

We will sometimes use the differential notation

dx(s) = b(s)ds + a(s)dw(s)

in lieu of (2.6). Then the vector version of Ito's formula is as follows. For
any f E C 2(JRk), let fx(·) and fxxO denote the gradient and Hessian
matrix of J, respectively. Define a(s) = a(s)a'(s). Then

f(x(t))- f(x(O)) = 1t f~(x(s))dx(s) + ~ 1t tr [fxx(x(s))a(s)] ds, (2.7)

where

lot f~(x(s))dx(s) =lot f~(x(s))b(s)ds +lot f~(x(s))a(s)dw(s),


and where tr B denotes the trace of any square matrix B.

1.3 Stochastic Differential Equations: Diffusions


Let b( ·) and a(·) be bounded measurable functions mapping JRk into JRk
and into the space of real k x n matrices, respectively. We now return to the
class of models introduced at the beginning of the previous section. Thus,
we consider solutions to

x(t) = x(O) + 1t b(x(s))ds + 1t a(x(s))dw(s). (3.1)

This class of models is widely used in applications in diverse areas. They


are characterized in terms of the drift vector b( ·) and diffusion matrix a(·) =
a(·)a'(·). Loosely speaking, such a process is a Markov process x(·) with
continuous sample paths and the following "local properties." Assume (for
now) that b(·) and a(·) are continuous, and let D..t > 0 be small. Then the
"local mean drift" and "local covariance" satisfy

E [x(t + D..t)- x(t)Jx(t)] ~ b(x(t))D..t, (3.2)

cov [x(t + D..t)- x(t)Jx(t)) ~ a(x(t))D..t. (3.3)


The classical probabilistic tool used in the construction and analysis of
solutions to (3.1) is the theory of stochastic differential equations due to
K. Ito. We often write equations such as (3.1) in the symbolic form

dx(t) = b(x(t))dt + a(x(t))dw(t). (3.4)

Let w(·) be a vector valued Ft-Wiener process and let x(O) be a given
F 0 -measurable random vector. By a solution to the SDE described by
1.3 Stochastic Differential Equations: Diffusions 15

(3.1) we mean a continuous Ft-adapted process x(·) which satisfies {3.1)


with probability one.
There are two important notions of the sense in which a solution to {3.4)
can be said to exist and also two senses in which uniqueness is said to hold.
The distinction will turn out to be important for our purposes.

Strong Existence. We say that strong existence holds if given a proba-


bility space (O.,F,P), a filtration Ft, an Ft-Wiener process w(·) and an
Fa-measurable initial condition x(O), then an .1t-adapted process x(·)
exists satisfying (3.1) for all t ~ 0.

Weak Existence. We say that weak existence holds if given any probabil-
ity measure J.t on JRk there exists a probability space (0., F, P), a filtration
Ft, an Ft-Wiener process w(·), and an Ft-adapted process x(·) satisfying
(3.1) for all t ~ 0 as well asP {x(O) E r} = J.t(r).

Strong existence of it solution requires that the probability space, filtra-


tion, and Wiener process be given first and that the solution x( ·) then be
found for the given data. Weak sense existence allows these objects to be
constructed together with the process x( ·). Clearly, strong existence implies
weak existence.

Strong Uniqueness. Suppose that a fixed probability space (O.,F,P), a


filtration :Ft, and an :Ft-Wiener process w(·) are given. Let Xi(·),i = 1,2,
solve (3.1) for the given Wiener process w(·). We say that strong uniqueness
holds if

P{xt(O) = x2(0)} = 1 => P{x 1 (t) = x 2 (t) for all t ~ 0} = 1.

Weak Uniqueness. Suppose we are given weak sense solutions

{(ni, .1i, Pi), :Fi,t• wi(· ), xi(·)}, i = 1, 2,


to {3.1). We say that weak uniqueness holds if equality of the distributions
induced on JRk by xi(O) under Pi, i = 1, 2, implies the equality of the
distributions induced on Ck [O,oo) by xi(·) under Pi,i = 1,2.

Strong uniqueness is also referred to as pathwise uniqueness, whereas weak


uniqueness is often called uniqueness in the sense of probability law. Two
relationships between these concepts that are useful are the following:
strong uniqueness implies weak uniqueness, and weak existence together
with strong uniqueness imply strong existence (83, Chapter 5.3]. For the
problems of interest in this book, namely, proving the convergence of nu-
merical schemes, it will be seen that weak solutions that are unique in the
weak sense are all that is actually needed.
16 1. Review of Continuous Time Models

Ito's Formula and the Differential Operator. Let x(·) be any solution
to (3.1), and suppose that the coefficients b(x(·)) and a(x(·)) satisfy the
conditions assumed of b(·) and a(·) in the last section. The link between
the process x( ·) and certain second order partial differential equations is
provided by Ito's formula and the differential operator that appears therein.
Let a(x) = a(x)a'(x), and for any f E C 2(Jflk) let
1
(£f)(x) = f~(x)b(x) + 2tr [fxx(x)a(x)]. (3.5)

Then Ito's formula (2.5) states that

f(x(t)) = f(x(O)) + 1t (£f)(x(s))ds + 1t f~(x(s))a(x(s))dw(s). (3.6)

We next discuss some basic results concerning solutions to SDE's. Let


11·11 denote the norm on the space of real k x n matrices given by llall 2 =
~k ~n 2
L.,i=l L.,j=l aw
Picard Iteration. The Picard iteration method is a classical approach to
the construction of solutions to ordinary differential equations. Ito extended
the method to construct and prove strong uniqueness of strong solutions
to (3.1). Although the proof appears in many places, we include an outline
here. This is done mainly to facilitate the discussion in Section 1.4 and
later sections on reflecting diffusions, for which the details are not so easily
available. In order to limit the discussion, here and elsewhere in the chapter
we consider only the case where the functions b( ·) and a(·) are nonrandom
and independent of time. The reader is referred to the large literature on
stochastic differential equations for generalizations.
A3.1. There exists C E (0, oo) such that

lb(x)- b(y)l V lla(x)- a(y)ll ~ Clx- Yl


for all x E JRk and y E JRk.

Theorem 3.1. Assume (A3.1). Then for every deterministic initial condi-
tion x(O), the SDE (3.1) has a strong solution that is unique in the strong
(and therefore also in the weak) sense.
The proof turns on the following estimate: let Zi, i = 1, 2 be continuous
Ft-adapted processes and define

Yi(t) = x(O) +lot b(zi(s))ds + 1t a(zi(s))dw(s), i = 1, 2,

6.y(t) = Yl(t)- y2(t), and 6..z(t) = z1(t)- z2(t). Then for each T E (0, oo)
there exists L E (0, oo) such that for all 0 ~ t ~ T,

E 0 ~~~t l~y(sW ~ L lot E 0 ~~~.16.z(rWds. (3.7)


1.3 Stochastic Differential Equations: Diffusions 17

This follows directly from the Lipschitz continuity properties of b and a, the
martingale property of the stochastic integral, and the estimate (1.3). For
more details, the reader may consult [83, Section 5.2]. From this estimate,
one may readily prove strong existence and strong uniqueness, as we now
show.
To prove uniqueness, let x1(·) and x2(·) both be solutions, and let f(t) =
Esupo<s<t lx1(s)- x2(s)i2. By taking xi(·)= Yi(·) = zi(·),i = 1,2, and
applying 13.7), we obtain

f(t) ~L 1t f(s)ds.

Then Gronwall's inequality [56] implies f(t) = 0 for all t E [0, T]. Because
T is arbitrary, this proves uniqueness in Ck [0, oo).
A solution to (3.1) can be constructed by a variation on the classical
technique of Picard iteration. A sequence of processes {Xn (·)} is defined
recursively by x 0 (t) = x(O), t 2: 0, and

XnH(t) = x(O) + 1t b(xn(s))ds + 1t a(xn(s))dw(s).

By the way in which the processes were defined, the elements of this se-
quence are Ft-adapted processes with continuous sample paths. Apply-
ing (3.7) for n 2: 1 with Yl(·) = Xn+l(·),y2(·) = z1(·) = Xn(·), and
z2(·) = Xn-1(·), we obtain

E [sup lxn+l(s)- Xn(sW]


O~s~t
~ L Jot E [sup lxn(r)- Xn-l(r)1 2] ds.
O~r~s
Iterating backward to n = 0 and evaluating the resulting integral yields

E [ sup ixn+l(t)- Xn(t)i 2] ~ (LTt


1
K ,
O~t~T n.
where
K = E [ sup (
09~T
t b(x(O))ds + lot a(x(O))dw(s))]
lo
2

is finite by (1.3) and (2.3). By Chebyshev's inequality,

We may therefore apply the Borel-Cantelli lemma to conclude the event


18 1. Review of Continuous Time Models

occurs infinitely with probability zero. Therefore, off a set N of zero prob-
ability, the sample paths of xn(·) are a Cauchy sequence in Ck [0, T]. Let
x(·,w) denote the limit of Xn(·,w) for w ¢ N. Because Tis arbitrary, we
can assume that the convergence, in fact, takes place in Ck [O,oo). Clearly,
x(·) is Ft-adapted. Since the assumed continuity properties of b(·) and
u(·}, (1.3}, and (2.3} imply

1t b(xn(s)}ds + 1t u(xn(s))dw(s)--+ 1t b(x(s}}ds + 1t u(x(s}}dw(s)

in Ck [O,oo} (w.p.l}, we conclude that x(·) is a solution to the stochastic


differential equation (3.4).

Solutions Via a Measure Transformation Method. We will briefly


outline a very useful method for obtaining weak solutions. The principle
application is in cases where b( ·) has less regularity than is needed for
the Picard iteration technique. Let w( ·) be an n-dimensional standard
Ft-Wiener process and let z(·) be an n-dimensional process with each
component in E&. Define

R(t) = exp (1t z'(s}dw(s)- ~ 1t iz(s)i 2 ds). (3.8}

By Ito's formula
R(t) =1 +lot R(s)z'(s)dw(s).
Because z(·) is bounded, EIR(t)i < oo. Thus, the process R(·) is a martin-
gale and therefore E R( t) = 1 for all t E [0, oo).
Now fix T E (O,oo), and define a probability measure PT on (O,FT) by

(3.9)

The equality ER(T) = 1 guarantees that f\. is indeed a probability mea-


sure.
Theorem 3.2. (Girsanov) Assume that R(·) defined by equation (3.8) is
a martingale. Then on the interval [0, T] the process

w(t) = w(t) -1t z(s)ds (3.10}

is an Ft- Wiener process on the probability space (O,FT, f\. ).


The typical use of Theorem 3.2 is in the following situation. Let w( ·) be
a k-dimensionalFt-Wiener process. Let w10 and w2(·) denote the first
n and last k- n components of w( ·), respectively. Let <11 ( ·) be n x n matrix
1.3 Stochastic Differential Equations: Diffusions 19

valued with the property that u11 ( x) is uniformly bounded for x E JRk.
Consider the stochastic differential equation

u1(x)dw1
{3.11)
b2(x)dt + u2(x)dw2,

where the dimensions of the xi(·),i = 1,2, b2{·), and 0'2(·) are all com-
patible. Assume that the drift vector and diffusion matrix of this equation
satisfy the continuity and boundedness conditions assumed for the Picard
iteration method, or any other set of conditions which guarantees the ex-
istence of a weak sense solution. Let b1 ( ·) be a bounded Borel measurable
function. If we define z1 {·) = u! 1 (x(·))b 1 (x(·)), and R(·), j\., and w10 by
(3.8), (3.9), and (3.10), respectively, then under j\., x(·) solves

dx 1(t) = b1 (x(t))dt + u1 (x(t))dw1 (t)


(3.12)
dx2(t) = b2(x(t))dt + u2(x(t) )dw2(t).
We thereby obtain weak sense existence for such a class of equations with
bounded measurable drift terms.
It is also possible to use Girsanov's theorem to prove weak sense unique-
ness. ·For the details under general conditions, the reader can consult [75,
83]. However, the basic idea which is involved can be outlined for the ex-
ample above. It must first be observed that the distribution of x( ·) under
P uniquely determines its distribution under j\.. Indeed, the uniform in-
vertibility of u 1 (·) implies that the distribution of x(·) determines that of
w 1 (·), and therefore that of the Radon-Nikodym derivative R(·) as well.
Next, consider two weak sense solutions to {3.12). Applying Girsanov's
theorem to these two solutions we may remove the b1 part of the drift term
and produce two weak sense solutions to (3.11). Suppose the assumptions
made on the coefficients appearing in {3.11) imply weak sense uniqueness
(e.g., (A3.1)). Then the distribution of the two weak sense solutions to
(3.11} must be identical, and by the preceding comments the same is true
of the two weak sense solutions of (3.12}.

Remark. Clearly, weak sense uniqueness is really a statement concerning


the measure induced by the process on Ck [0, T]. Suppose that a weak sense
uniqueness result holds for every deterministic initial condition. Then weak
sense uniqueness must also follow for an arbitrary initial distribution.

Controlled Diffusions. Suppose that w( ·) is a Wiener process on some


probability space {0, :F, P). Let U be a compact subset of some Euclidean
space, and let u( ·) be a U -valued, measurable process also defined on
(0, :F, P). We can topologize the set of such controls in any convenient way,
e.g., the topology inherited from the L 1 metric. We say that the control u(·)
is nonanticipative with respect to the Wiener process w( ·) if there exists
a filtration :Ft defined on (O,:F, P) such that u(·) is :Ft-adapted and w(·)
20 1. Review of Continuous Time Models

is an .1"t-Wiener process. We say that u(·) is an admissible control law


with respect tow(·), or that the pair (u(·),w(·)) is admissible, if u(·) is
nonanticipative with respect tow(·).
In this section, we will consider controlled diffusions of the form

dx(t) = b(x(t), u(t))dt + a(x(t))dw(t), (3.13)

which is the differential representation of

x(t) = x(O) + 1t b(x(s), u(s))ds + 1t a(x(s))dw(s).

In analogy with the uncontrolled case, we have the following definitions.

Strong Existence. We say that strong existence holds if given a probabil-


ity space (O,F,P), a filtration .1"t, an .1"t-Wiener process w(·), a control
process u(·) that is .1"t-adapted and an Fo-measurable initial condition
x(O), then an .1"t-adapted process x(·) exists satisfying (3.13) for all t ~ 0.

Weak Existence. Suppose we are given probability distributions A and 1-L


on the sample path space of the pair (u(·), w(·)) and on JRk, respectively. We
say that weak existence holds if there exists a probability space (0, .1", P),
a filtration Ft, an .1"t-Wiener process w(·), an .1"t-adapted control process
u(·) (i.e., the pair (u(·),w(·)) is admissible), and an .1"t-adapted process
x( ·) such that A and 1-L are the distributions induced by (u( ·), w( ·)) and
x(O) under P, and such that x(·) satisfies (3.13) for all t ~ 0.

Strong Uniqueness. Suppose that a fixed probability space (O,:F,P), a


filtration :Ft, an :Ft-Wiener process w(·), and a control process u(·) that is
:Ft-adapted are given. Let xi(·},i = 1,2, solve (3.13) for the given Wiener
process and control process. We say that strong uniqueness holds if

P{x1(0) = x2(0)} = 1 =? P{x1(t) = x2(t) for all t ~ 0} = 1.

Weak Uniqueness. Assume that we are given weak sense solutions

to (3.13}. We say that weak uniqueness holds if equality of the joint distri-
butions of (ui(·),wi(·),xi(O)) under Pi,i = 1,2, implies the equality of the
distributions of (xi(·),ui(·),wi(·),xi(O)) under Pi,i = 1,2.

Strong solutions that are unique in the strong sense and weak solutions
that are unique in the weak sense may be constructed in exactly the same
way as for the case of uncontrolled diffusions. For example, assume the
following analogue of (A3.1).
1.4 Reflected Diffusions 21

A3.2. There exists C E (O,oo) such that

lb(x,u)- b(y,u)l V IICT(x)- CT(y)ll::; Clx- Yl

for all x E JRk, y E JRk, and u E U. Furthermore, the function b : JRk xU --+
JRk is measumble, and supuEU lb(O,u)l < oo.
Under this assumption the Picard iteration and Girsanov transformation
methods may be used to get existence and uniqueness results analogous to
those without control. The details are the same as those in the case with
no control as long as the control process is Ft -adapted and measurable,
with the only changes in the proofs being notational.
For fixed a E U, we define the differential operator .co. by

1
(.Co. f)(x) = f~(x)b(x, a)+ 2tr [fxx(x)a(x)].

If u( ·) is an admissible control process and if x( ·) is the associated solution


to (3.13), then Ito's formula continues to hold in the form

f(x(t)) = f(x(O)) + 1t (.Cu(s) f)(x(s))ds + 1t f~(x(s))e1(x(s))dw(s).


1.4 Reflected Diffusions
In this section we will review results concerning diffusion processes that are
"instantaneously" reflected back into a closed domain G when the process
tries to leave G. One of the reasons for the wide applicability of diffusion
processes is that they serve as mathematically tractable approximations to
the actual "physical" processes that are encountered in applications. For
example, a diffusion process could be used to approximate a process defined
by driving an ODE by some ''wide band" stationary noise process. Other
examples occur when a process that is originally defined in discrete time is
suitably interpolated and scaled and then replaced by a diffusion. A large
number of examples of both types may be found in [93].
An example of a process that can be well approximated by a diffusion pro-
cess is that of a queueing process under certain assumptions (the so-called
heavy tmffic assumptions) and after a suitable rescaling of the space and
time variables. Here, the nonnegativity of the components of the original
process imply a constraint on the state space of the approximating diffu-
sion process. It turns out that the proper way to implement this constraint
in the diffusion approximation is by adding what is known as a reflecting
boundary. Reflecting diffusions often give the appropriate diffusion approx-
imations for processes that are constrained in some way to remain in a
given set.
22 1. Review of Continuous Time Models

In order to motivate the definition given below for a reflecting diffusion,


we consider a simple example. Let {~i, i < oo} be a sequence of independent
and identically distributed (iid) .IRk-valued random variables. Assume that

where the prime denotes transpose, cov stands for covariance, and a is a
k x k matrix. Let [a) denote the integer part of a. It is well known that the
process
[t/n]
xn(t) = n- 1/ 2 L ~i + x
i=l

tends weakly (see Section 9.1 for the definition) to the solution of the SDE

dx(t) = adw(t), x(O) = x,


as n -t oo. Suppose that we consider a constrained version of this process.
We will use a very simple constraint mechanism. However, the basic ideas
carry over to more complicated constraint mechanisms, such as those in
queueing systems. Let v be any vector with positive first component v1,
and let G = {x: x 1 2:: 0}. For x (/. G, define tx = inf{t > 0: x+tv E G}
and 7r(x) = x + txv. For x E G, let 7r(x) = x. Thus, 7r(·) is the projection
onto G "along" v. For each n < oo, we define processes {xf,i < oo},
{yf,i < oo}, and {zf,i < oo} by (x~,y~,zl)) = (x,x,O), and

xf+l 7r(xf + ~i),


Yf+I = Yi + ~i,
zf+l zf + [7r(xf + ~i)- eil·

Define the interpolations xn(t) = n- 112 xn/n]' yn(t) -1/2 n d


n Y[t/n]' an
zn(t) = n- 1 1 2 z~/n]" Then we can write

Because the process yn(-) does not involve the constraint, its behavior is
easy to determine. As remarked previously, yn(-) will converge weakly to
the solution to dy(t) = adw(t), y(O) = x. The process zn(-) plays the role
of a bookkeeping device, by recording the effects of the projections in such
a way that xn(-) can be recovered from yn(-) by adding zn(-). The zn(-)
has three important properties. First, zn (·) can change only at times t such
that xn(t) E 8G. Thus, the evolution of xn(-) is the same as that of yn(-)
when xn(-) is away from 8G. Second, the direction of change of zn(-) is
determined by the constraint mechanism, and the last property is that the
change in zno at any given time is the minimal amount needed to keep
xn(-) in G.
1.4 Reflected Diffusions 23

For the scaled queueing systems referred to at the beginning of this


section, as well as many other constrained processes, there are analogous
decompositions into an essentially "unconstrained" part for which it is rel-
atively easy to find a diffusion approximation, plus a term that records the
effects of the constraint. Examples can be found in Chapter 8.
The properties of yn(-) and zn(-) given in the example above suggest that
the following properties should be expected of any diffusion x( ·) claiming
to approximate xn(-) (for large n).

• Away from the constraining boundary 8G that is inherited from the


constraint on xn ( ·), the process x( ·) should behave as the ordinary
diffusion approximation to yn(-).

• When x( ·) attempts to cross 8G it should be returned to G by the


action of a compensating process. The direction in which this com-
pensating process "pushes" x( ·) will be determined by the constraint
mechanism and should be the minimal amount required to keep x( ·)
in G.

It will be seen below that reflecting diffusions satisfy these properties.


Owing to the range of processes for which reflected diffusions are the appro-
priate diffusion approximation, we must deal with a wide variety of possible
settings, e.g., domains with corners, oblique reflections, and multi-valued
directions of reflection.
In addition to problems in which reflecting boundaries arise as the result
of a diffusion approximation, there are problems in which they are used to
bound the state space of a process for numerical purposes. See, in particular,
Chapter 11.

Remark. As the discussion above suggests, the term "reflecting" is a bit of


a misnomer. It may be that "constrained" diffusion with a few more terms
describing the method of constraint would be more accurate. Nonetheless
we will follow current convention and retain the adjective "reflecting."

The approach taken in this book to the study of reflecting diffusions is


based on the use of the Skorokhod Problem. This approach goes back to
Skorokhod in the case of one dimension and has been used by several au-
thors since then in a variety of contexts [27, 42, 41, 47, 69, 115, 134, 147].
It is also possible to base an approach to the study of reflecting diffusions
on the submartingale problem [144]. The submartingale problem gives an
alternative method of characterizing processes with certain boundary be-
haviors. There are some processes for which the only existing proofs of weak
uniqueness (which will be essential for our purposes) are based on the sub-
martingale problem formulation. Examples are diffusions on domains with
"sticky" boundary conditions [144] and certain classes of reflecting Brow-
nian motion models [149]. We have chosen to use the Skorokhod Problem
24 1. Review of Continuous Time Models

approach to reflecting diffusions because it is intuitively appealing and rel-


atively simple to apply. However, the reader should be aware that the basic
property we will need for the numerical methods developed in this book to
apply to a given reflecting diffusion model is the weak sense uniqueness.
We note that because the definition of a weak sense solution is independent
of the method of characterization, the methods of generating convergent
numerical schemes presented in Chapter 5 and elsewhere and the proofs of
convergence are applicable in general, regardless of the particular method
used to prove the uniqueness.
Let G C JRk be a closed set and assume that G is the closure of its
interior. To each point x E 8G, we associate a set r(x) C {y E JRk : JyJ = 1}
called the directions of reflection. Let '1/J E Ck [0, oo) be a given path that
satisfies '1/J(O) E G. For a function of bounded variation 'fJ mapping [O,oo)
to JRk, we let JTJJ(t) denote the total variation over the interval [0, t], and
let J.t., denote the measure on [0, oo) which is defined by the total variation.
The precise definition of the Skorokhod Problem is as follows.
Definition 4.1 (Skorokhod Problem) Let '1/J E Ck [O,oo) with '1/J(O) E
G be given. Then (¢, TJ) solves the Skorokhod Problem for '1/J {with respect
toG and r) if
1. ¢ = '1/J + TJ, ¢(0) = '1/J{O),

2. ¢(t) E G fortE [O,oo),


3. ITJJ(t) < oo for all t < oo,
4· ITJJ(t) = f(o,t] I{tf>(s)E8G}dJTJJ(s),
5. there exists measurable 'Y : [0, T] -+ JRk such that 'Y( s) E r( ¢( s)) (J.t.,
a.e.) and 71(t) = J(o,t] -y(s)dJTJJ(s).
Although the definition appears abstract, it gives a very convenient math-
ematical tool for the construction and analysis of reflecting diffusions. The
level of abstraction is in part due to the fact that we need to define a re-
flected version for each path in Ck [0, oo). It is possible to give a simpler
definition if we restrict to only smooth paths, but this is not sufficient if
we wish to deal with reflecting diffusions. Some comments on the definition
are as follows. The function ¢ is to be viewed as the natural constrained
version of '1/J. Part 1 of the definition states that ¢starts at the same point
as '1/J, and that¢ will be obtained from '1/J by adding a function 7J which will
"push" ¢ in the proper direction at the proper time. Part 2 simply states
that¢ is constrained to G. Part 3 is useful in establishing a semimartingale
decomposition for reflected diffusions and for obtaining estimates for such
processes. Part 4 implies that 7J may only push ¢ while ¢ is on the boundary
of G and that 71 does not change while ¢ is off the boundary. Thus, the evo-
lution of¢ duplicates that of '1/J when ¢ is away from the boundary. Lastly,
1.4 Reflected Diffusions 25

part 5 implies that 'f/ is only allowed push in the directions consistent with
the current position of ¢ when it is on the boundary.
A comparison of these properties with the properties that one would ex-
pect of the sample paths of reflecting diffusions indicates a close connection.
Our definition of a stochastic differential equation with reflection (SDER)
is as follows. (The definition will be given without a control. Extending the
definition to include a control is straightforward.) Consider a probability
space (n, F, P) on which is defined a filtration {Ft. t ;::: 0}. Let b( ·) and u( ·)
be functions of the appropriate dimensions and suppose that {w(t), t;::: 0}
is an r-dimensional standard Fr Wiener process.

Definition 4.2. (SDER) The Ft-adapted process x(·) is a solution to


the SDER for the domain G, directions of reflection r(·), initial condition
x(O) E G and Brownian motion {w(t), t ;::: 0}, if x(t) E G for all t ;::: 0
(w.p.1), and

x(t) = x(O) +lot b(x(s))ds +lot u(x(s))dw(s) + z(t),

where
lzl(t) =lot I{:z:(s)EBG}dlzl(s) < oo,

and where there exists measurable 'Y(s) E r(x(s)) (J.tz a.e.) such that

z(t) =lot 'Y(s)dlzl(s)

(w.p.1).

In other words, (x(·),z(·)) should solve (on a pathwise and w.p.1 basis)
the SP for 1/J(·) = x(O) +I~ b(x(s))ds +I~ u(x(s))dw(s).
As in the case of diffusions without boundaries, there are two senses in
which solutions can be said to exist and also two senses in which they
can be said to be unique. These definitions are simply the exact analogues
of those for the case of no boundary. In general, the existence of strong or
weak solutions to the SDER and the relevant uniqueness properties depend
on regularity properties of G and r( ·). Those approaches that are based on
the Skorokhod Problem use these regularity properties to derive estimates
for the mapping 1/J ~ (¢,'fl) defined by solving the Skorokhod Problem.
We next present a few very simple examples for which this mapping is Lip-
schitz continuous. This allows an elementary derivation of (3.7), after which
the analysis proceeds just as in the case of unconstrained diffusions. The
basic differences between these examples and the more general situations
considered in [41, 115, 134, 147] are the more involved calculations required
to get (3.7).
26 1. Review of Continuous Time Models

Example 4.3. (Anderson and Orey) [5] Let n be a given unit vector,
and let G be the interior of {x: (x, n) :::; 0}. Thus, n is the outward normal
at all points of 8G. Let r be a unit vector that satisfies (r, n) < 0. We will
take r(x) = {r} for all x E 8G. Suppose that¢(·) is given with ¢(0) E G.
If we define
I?JI(t) =- (o V sup ('1/J(s), n)) /(r, n),
O~s9

11(t) = l11l(t)r,

¢(t) = 'ljJ(t) + ?J(t),


then (¢(·),1J(·)) solves the Skorokhod Problem for'¢(·). From these equa-
tions, it is easy to show that if (¢i(·), 1Ji(·)) solves the Skorokhod Problem
for '1/Ji(·), i = 1, 2, then

sup l771(s) -772(s)l :S ( -1()) sup I'I/J1(s)- 'I/J2(s)l, (4.1)


O~s~t r,n o~s~t

and, therefore,

These estimates allow a very simple study of the corresponding SDER.


Let¢(·)= r('¢(·)) denote the mapping'¢(·)--+¢(·), and let Zi, i = 1, 2, be
measurable, Ft-adapted processes with continuous sample paths. Define

ili(t) = x(O) + 1t b(zi(s))ds + 1t a(zi(s))dw(s)

and Yi(t) = r(i/i)(t) fort ~ 0 and i = 1, 2. Obviously, the fliO are measur-
able, Ft-adapted processes with continuous sample paths. Equations (4.1)
and (4.2) imply uniqueness of the pair (¢, 1J) given '¢. If x E G and '¢( t) = x
for all t ~ 0, then (¢(t),1J(t)) = (x,O) is the solution to the SP. This fact
together with (4.2) imply the continuity of ¢ whenever '1/J is continuous
and¢= r('¢). Thus, the mapping r is a Lipschitz continuous mapping of
Ck [0, oo) into itself. Furthermore, uniqueness of the solution to the SP and
our explicit representation of the solution imply that ¢(t) is a measurable
function of {'1/J(s),s E [O,t]}. Therefore, the processes Yi(·),i = 1,2, are
measurable, Ft-adapted processes with continuous sample paths. Define
~fj(t) = fj1(t)- fj2(t) and also ~y(·) and ~z(-) in an analogous fashion.
Then, under (A3.1), equation (3.7) gives

E [sup
O~s~t
l~fj(s)1 2 ] :S L t
Jo
E [ sup
O~u~s
l~z(u)1 2 ] ds
1.4 Reflected Diffusions 27

for some L E {O,oo). Using (4.2), we have

where L 1 E {O,oo). Therefore, the existence, uniqueness, and other prop-


erties follow from the same arguments as those used for the unreflected
versions in Section 1.4. Note that we should now define the sequence used
in the existence proof by xo(t) = x(O),

XnH(t) = x(O) +lot b(xn(s))ds +lot u(xn(s))dw(s},


XnH(t) = r(xnH)(t), ZnH(t) = Xn+l(t)- XnH(t).
Then (xn(·),zn(·)) converges to a pair of processes (x(·},z(·}}, where x(·)
is a solution to the SDER. •

Suppose that G possesses a smooth boundary with exterior normal n( ·)


and that r(·) is a smooth vector field satisfying (r(x},n(x)) < 0 for all
X E aa. Then, by use of appropriate local coordinate systems and stopping
times, the results of Example 4.3 imply strong existence and uniqueness for
the SDER corresponding toG and r(·) [5].
The next example considers the choice of G and r( ·) that is probably most
natural when a reflecting boundary condition must be imposed simply for
the purposes of bounding the state space.

Example 4.4. Let di > c;, i = 1, ... , k be given, and define

G = {x: Ci <Xi< di fori= 1, 2, ... , k}.

Let n(-) denote the set of outward normals and suppose r(·) = -n(·). Thus,
the constraining action is applied along the direction of the inward normal.
Note that r(·) is multi-valued at any of the "corner" points of 8G. It is
proved in [42] that the solution mapping to the Skorokhod Problem is again
Lipschitz continuous in the sense of equations (4.1} and (4.2}, although the
coefficients in this case are k 112 and k112 +1, respectively. Therefore, SDER
may be solved and uniqueness proved just as in Example 4.3. •

Remarks. In general, when good estimates are available on the Skorokhod


Problem analogues of the statements regarding existence, uniqueness, and
so on that hold for the case unreflected processes carry over to the reflected
case. It should also be noted that the Girsanov transformation method
(see Section 1.3} applies here as well and in precisely the same way. Other
examples of reflecting or constrained processes will appear in later chapters.
28 1. Review of Continuous Time Models

1.5 Processes with Jumps


In this section we will discuss Markov models with jumps. We are interested
in models of the form

x(t)- x(O) = 1t b(x(s))ds + 1t u(x(s))dw(s) + J(t),

where b(·) and u(·) are as in the preceding sections and the J(t) term
produces the jumps. For the jump term we would like to specify (at least
approximately) the probability that a jump occurs in any small time in-
terval together with the distribution of any resulting jumps as functions of
the past history of the process. Between jumps, the term J(·) is constant.
In order to preserve the Markov property, the "jump intensity" at time t
and distribution of any jumps at time t should depend only on limstt x( s).
Let A(·) be a function mapping JRk into [O,oo), and let fi(x,dy) be a
probability transition kernel on JRk. For now assume that A(·), b(·) and
a(·) are all continuous and that fi{x, ·)is continuous in x in the topology of
weak convergence. Let ~t > 0 be small and fix t. A rough description of the
term J (·) that is consistent with the properties described above is as follows.
Let x(t-) = limstt x(s). With probability equal to A(x(t- ))~t+o(~t), J(·)
will jump once at some time in the interval [t, t + ~t]. The probability of
two or more jumps is o(~t). Thus, A(·) gives the overall jump rate. Given
that a jump has occurred, its distribution will be given approximately by
fi(x(t- ), ·).Between jumps, the process x(·) behaves like a diffusion process
with no jumps and with the local properties described by b(·) and a(·).
The general theory and treatments of various approaches to such pro-
cesses can be found in [75, 78, 79]. For the purposes of this book, the pro-
cesses may be constructed and the needed properties proved using rather
simple arguments.

Construction of the Process. Under the assumptions we will make be-


low, the weak sense existence and uniqueness properties for this class of
jump diffusion processes are essentially determined by the corresponding
properties of the analogous diffusion with no jumps. For example, assume
that (A3.1) holds. This assumption is sufficient to guarantee the existence
of a unique strong sense solution to the diffusion with no jumps. It will
turn out to imply (under our assumptions only, and not in the general case
as considered in [79]) the corresponding result for the jump diffusion. A
construction of the process is as follows. Assume that we are given a fil-
tration :Ft on a probability space (0, :F, P), together with an :Ft- Wiener
process w(·). Let TI(-) be a probability measure on the Borel subsets of !Rn
that has compact support r. Suppose that on the same probability space
there are defined mutually independent sequences of iid random variables
{rn,n < oo} and {pn,n < oo}, where the Tn are exponentially distributed
with mean 1/A and the Pn have distribution TI(·). Assume also that these
1.5 Processes with Jumps 29

random variables are independent of w(·). Let llo = 0 and lln+l =lin+ Tn·
The lin will be the jump times of the process. Let q : JRk x JRn --+ JRk be
a bounded measurable function. Starting with a given initial condition x,
we construct a solution x 1 (·) to

Then define

We continue by repeating the process. We define

and then define

and so on. The process thus constructed will be defined for all t ~ 0 since
lin--+ oo as n--+ oo (w.p.1). The mutual independence of the components
used to construct the process implies the Markov property.
The process we have constructed satisfies the description given at the
beginning of the section, with

A(x) =A f II(dp) ::; A, (5.1)


J{p:q(x,p)#O}
and for x such that A(x) =/:. 0

- A)
II(x, = -
A(x)
A 1 {p:q(x,p)EA,q(x,p)ofO}
II(dp). (5.2)

(Note that the definition of fi is unimportant if A(x) = 0.) Consider the


following assumptions on A(·) and fi(·, ·).

A5.1. The function A(x) is continuous and uniformly bounded, and the
support of fi(x, ·) is contained in some compact set that is independent
of x. Furthermore, fi(x, ·) is continuous as a mapping from the x E JRk
into the space of probability measures endowed with the topology of weak
convergence.

It can be shown [79] that if A(·) and IT(·,·) satisfy (A5.1), then A< oo,
a probability measure II(·), and a bounded measurable function q(·, ·) can
be found so that (5.1) and (5.2) hold. For convenience, we will consider A,
II(·), and q(·, ·) as characterizing the jump part of a jump diffusion.
30 1. Review of Continuous Time Models

The same approach to constructing jump diffusions can be used if we


weaken the assumptions on b(·) and a(·). For example, the Girsanov trans-
formation method (see Section 1.3) can be used exactly as before to con-
struct a weak sense solution for cases in which the drift is discontinuous.
It will turn out to be useful to have a representation of these processes as
solutions to a SDE, analogous to that of diffusions without jumps. In order
to account for the jumps, an additional driving term is introduced in the
form of a Poisson random measure. The definition we use is not the most
general, but is sufficient for our purposes. By an integer valued measure
we mean a measure that always yields an integer as the measure of any
measurable set.

Poisson Random Measures. Assume that we are given a probability


space (0, F, P) on which a filtration Ft is defined and a probability measure
II(·) on the Borel subsets of IRn. Let Ft- be the a-algebra generated
by the union of F8 , s < t. Then an Ft- Poisson random measure with
intensity measure h(dtdy) = .Adtxii(dy) is a measurable mapping N(·) from
(0, F, P) into the space of integer valued positive measures on JR+ x IRn
with the following properties:
1. For every t ~ 0 and every Borel subset A of [0, t] x IRn, N(A) is
Ft- measurable.
2. For every t ~ 0 and every Borel subset A of [t,oo) x IRn, N(A) is
independent of Ft-.
3. E [N(A)] = h(A) for every Borel subset A of JR+ x IRn.
A Poisson random measure is the counting measure of a Poisson point
process on JR+ x IRn. We use the "points" of a realization of the random
measure to identify the times and the magnitudes of jumps appearing in the
jump Markov process defined by the stochastic differential equation given
below. It can be shown [79] that, for any sequence {Ai} of pairwise disjoint
Borel subsets of JR+ x IRn, the random variables N(Ai) are independent,
and, furthermore, each N(Ai) is a Poisson random variable with mean value
h(Ai)· The Poisson random measure representation provides a convenient
representation for the jump term, in that it provides the correct relation
between the filtration of the process and the desired properties of the times
between jumps and jump distribution. Suppose that {(vi, Pi), i < oo} are
the point masses of the Poisson random measure. Conditioned on Ft-, the
random variable inf{vi- t : vi ~ t} is exponentially distributed with mean
1/>.. Given that (v,p) is a point mass of N(·), pis distributed according to
II(·) (w.p.l).
Let Ft be a filtration on a probability space (O,F,P), and let w(·) and
N(·) be an Ft-adapted Wiener process and Poisson random measure, re-
spectively. By a solution of the SDE
dx(t) = b(x(t))dt + a(x(t))dw(t) + dJ(t), (5.3)
1.5 Processes with Jumps 31

J(t) = f q(x(s- ), p)N(dsdp),


J[O,t]xJRn

together with the .1"0 -measurable initial condition x(O), what is meant is
an Ft -adapted process x( ·) with paths in Dk [0, oo) which satisfies the
integrated form

x(t) = t
Jo
b(x(s))ds + t
Jo
a(x(s))dw(s) + f
J[o,t]xJRn
q(x(s-),p)N(dsdp).
(5.4)
In complete analogy with case of diffusions with no jumps, we have def-
initions of weak and strong existence, and weak and strong uniqueness.
Because we will only use the weak sense properties in both cases, state-
ments are only given for these cases.

Weak Existence. We say that weak existence holds if given any proba-
bility measure JL on fflk, there exists a probability space (0, F, P), a filtra-
tion Ft, an Ft-Wiener process w(·), an Ft-Poisson random measure N(-)
and an Ft-adapted process x(·) satisfying (5.4) for all t 2:: 0, as well as
P{x(O) E A}= JL(A).

Weak Uniqueness. Suppose we are given two weak sense solutions

to (5.4). We say that weak uniqueness holds if equality of the distribu-


tions induced on fflk by xi(O) under Pi, i = 1, 2, implies equality of the
distributions induced on Dk [O,oo) by xi(·) under Pi,i = 1,2.

Existence and Uniqueness of Solutions. As remarked previously, the


weak sense existence and uniqueness properties of these jump diffusions
basically follow from the corresponding properties in the case of no jumps.
Weak sense existence has been discussed already. Weak sense uniqueness
will also hold for the jump diffusion if it holds for the corresponding case
without jumps [this assertion is true under (A5.1), but not under more
general assumptions that are often used, e.g., [79]]. This follows under our.
assumptions because with probability one, the jump times are isolated and
tend to infinity. Let XI and x2 be weak sense solutions to (5.4) that start at
x (we neglect specifying the probability spaces, the Wiener processes, etc.).
Let VI and v2 be the time of the first atoms for the Poisson random measures
corresponding to XI and x 2 , respectively. [This may or may not equal the
first jump time, since there may be points (x, p) for which q(x, p) = 0.]
Let PI and P2 identify the spatial coordinates of these atoms. Because
the two Poisson random measures have the same intensity measure, the
distributions of VI and v2 are the same, and the same is true of PI and
P2· By the weak sense uniqueness, and the independence of the Wiener
32 1. Review of Continuous Time Models

process and the Poisson random measure, the distributions induced by x 1


and x2 up to and including the times 111 and 112 are the same. Repeating
the procedure, we obtain equivalence of distributions up to the time of
occurrence of the second atom, and so on. The fact that the occurrence
times of the atoms tend to infinity (w.p.l) implies the equivalence of the
distributions on [0, oo).

Ito's Formula. The following formula follows directly from the definition
of a solution to (5.4) and the corresponding formula for the case of diffusions
with no jumps.

f(x(t)) = f(x(O)) + 1t (.Cf)(x(s))ds + 1t f~(x(s))u(x(s))dw(s) + JJ(t),

where a(x) = u(x)u'(x) as usual,

(.Cf)(x) = f~(x)b(x)+~tr [fxx(x)a(x)]+A(x) l [f(x + y)- f(x)] fi(x,dy),


(5.5)
and

J1(t) = L [f(x(s)) - f(x(s- ))]

-1t A(x(s)) l [f(x(s) + y)- f(x(s))] fi(x(s),dy)ds.


Since JJ(t) is a martingale,

Ef(x(t)) = Ef(x(O)) + E 1t £f(x(s))ds.

Constrained Diffusions With Jumps. We can define a reflected jump


diffusion that is analogous to the reflected diffusion. Let the domain in
which the process is to stay be G. We need only define what happens if a
jump takes the process out of G.
Suppose that a jump diffusion with components A, II(·), and q(·, ·) is
given. Suppose also that the desired behavior for the reflected jump dif-
fusion is the following. Whenever a jump would take the process out of
G to a pointy, then the process is to be "instantaneously" returned to a
point c(y) E G. Assume that c(·) is measurable. This desired behavior of
the process can be achieved by replacing q(·, ·) by q(·, ·) defined by

c(x+q(x,p)) =x+q(x,p).
Thus, without loss, we can assume that the following convention is always
in effect. Whenever we are dealing with a reflected jump diffusion, we will
1.5 Processes with Jumps 33

assume that q( ·, ·) has been chosen so that

x + q(x,p) E G for all x E G and all p.

Controlled Diffusions with Jumps. In analogy with Section 1.3, we


have the following definition. Let u( ·) be a control process taking values in
U, let w(·) be a Wiener process, let N(·) be a Poisson random measure, and
assume they are all defined on a common probability space and that w(·)
and N(-) are independent. We say that the control u(·) is nonanticipative
with respect to the pair (w (·), N (·)) if there exists a filtration Ft such
that u(·) is Ft-measurable, w(·) is an Ft-Wiener process, and N(·) is an
Ft- Poisson random measure. We say that u( ·) is an admissible control law
with respect to (w( ·), N( ·)) or that the triple (u( ·), w( ·), N( ·)) is admissible
if u( ·) is nonanticipative with respect to (w(·), N (·)).
We consider controlled SDE's of the form

dx(t) = b(x(t), u(t))dt + u(x(t))dw(t) + dJ(t), (5.6)

J(t) = f q(x(s-), p)N(dsdp),


J[o,t]xB!•

where, as usual, we interpret this equation via its integrated form. We have
the following definitions.

Weak Existence. Suppose we are given probability distributions A and J.L


on the sample space of the triple (u( ·), w( ·), N ( ·)) and JRk, respectively. We
say that weak existence holds if there exists a probability space (0, F, P), a
filtration :Ft, an :Ft-Wiener process w(·), an :Ft-Poisson random measure
N(·), an Ft-adapted control process u(·) [i.e., the triple (u(·),w(·),N(·))
is admissible], and an :Ft-adapted process x(·), such that A and J.L are the
distributions induced by (u(·),w(·),N(·)) and x(O) under P, and such that
x( ·) satisfies (5.6) for all t ~ 0.

Weak Uniqueness. Assume we are given two weak sense solutions

to (5.6). We say that weak uniqueness holds if equality of the joint distri-
butions of (ui(·),wi(·),Ni(·),xi(O)) under Pi,i = 1,2, implies the equality
of the distributions of (xi(·), ui(·), wi(·), Ni( ·), xi(O)) under Pi, i = 1, 2.

By arguing in the same way as for the case without control, one can
show that if there is weak sense uniqueness for the case of diffusions with
arbitrary U -valued admissible control laws and initial conditions, then
there is weak sense uniqueness here as well.
34 1. Review of Continuous Time Models

For fixed o: E U, we define the differential operator for solutions of (5.6)


by
1
(C)l f)(x) f~(x)b(x, o:) + 2tr [f:u:(x)a(x)]
+ >.(x) [ [f(x + y)- f(x)J fi(x, dy).

For a control process u( ·), define the operator by r;u( ·l . If u( ·) is an admis-


sible control law, then Ito's formula for functions of the controlled process
(5.6) continues to hold with this definition of the differential operator.
2
Controlled Markov Chains

The main computational techniques in this book require the approxima-


tion of an original controlled processes in continuous time by appropriately
chosen controlled finite state Markov chains. In this chapter, we will de-
fine some of the canonical control problems for the Markov chain models
which will be used in the sequel as "approximating processes." The cost
functions will be defined. The functional equations which are satisfied by
these cost functions for fixed controls, as well as the functional equations
satisfied by the optimal cost functions (the dynamic programming or Bell-
man equation), will be obtained by exploiting the Markov property and
the uniqueness of their solutions is shown, under appropriate conditions.
These are the equations which will have to be solved in order to get the
required approximate solutions to the original control or optimal control
problem. The simplest case, where there is no control or where the control
is fixed, is dealt with in Section 2.1, and the recursive equations satisfied
by the cost functionals are obtained. A similar method is used to get the
recursive equations for the optimal value functions for the controlled prob-
lems. The optimal stopping problem is treated in Section 2.2. This is a
relatively simple control problem, because the only decision to be made
is the choice of the moment at which the process is to be stopped. This
problem will illustrate the basic ideas of dynamic programming for Markov
chains and introduce the fundamental principle of optimality in a simple
way. Section 2.3 concerns the general discounted cost problem. Section 2.4
36 2. Controlled Markov Chains

deals with the optimization problem when the control stops at the first
moment of reaching a target or stopping set. The basic concept of con-
traction map is introduced and its role in the solution of the functional
equations for the costs is emphasized. Section 2.5 gives the results for the
case where the process is of interest over a finite time only. The chapter
contains only a brief outline. Further information concerning controlled or
uncontrolled Markov chain models can be found in the standard references
[11, 54, 84, 88, 126, 151, 155].

2.1 Recursive Equations for the Cost


Let {~n, n < oo} be a Markov chain with the time independent transition
probabilities p(x,y) = P{~n+l = Yl~n = x} on a finite state spaceS. It is
sufficient for our purposes that S be finite, because that will be the case
for the Markov chain models which will be used for the approximations of
the continuous time control problems in the later chapters. Let Ex denote
the expectation of functionals of the chain, given that the initial condition
is x. We will define the cost functionals and give the equations which they
satisfy for many of the cases which will be of interest later in the book.

2.1.1 Stopping on first exit from a given set


Let as c S be a given subset of the state space such that we stop the
chain at the moment N = min{n: ~n E as} of first reaching as. We use
the "boundary" notation as to denote this set, since later on this set will
be a "discretization" of the boundary of the state space of the processes
which we will be approximating, and it is useful to introduce the notation
at this point.
Suppose that

ExN < oo, for all X E s - as. (1.1)

For given functions c(·) and g(·), define the total cost until stopping by

(1.2)

c( ·) is called a "running" cost or cost "rate" and g( ·) is a "stopping" or


"boundary" cost. It follows from the definitions that W(x) = g(x) for x E
as. A functional equation for W(·) can be derived by exploiting the Markov
2.1 Recursive Equations for the Cost 37

property of {~n• n < oo} and rewriting (1.2) as follows. For xES- aS,

W(x) = c(x) +E. {E.[%;' c({.) + g({N)I6]}


c(x) + ExW(~I) (1.3a)

= c(x) + LP(x,y)W(y).
yES

For X E as, we have


W(x) = g(x). (1.3b)

A Vector Form of {1.3). It is often useful to write (1.3) in a vector form.


There are two convenient methods for doing this, depending on whether
the states in the "stopping" or boundary set as are included or not. Let lSI
denote the number of elements in the set S. For the first method, define the
vector 6 = {C(x),x E S} by C(x) = c(x) for XEs- as and C(x) = g(x)
for X E as. Define the vector of costs w
= {W(x), X E S}. All vectors
are column vectors unless otherwise mentioned. Because our interest in
the chain stops when it first enters the stopping set as, let us "kill" it at
that point, and define the "killed" transition subprobability matrix R =
{f(x,y); x,y E S}, where f(x,y) = p(x,y) for XEs- as and f(x,y) = 0
for XE as and y E s. Then (1.3) can be written as the lSI-dimensional
equation:
(1.4)
By the finiteness (with probability one) of the stopping times which is
implied by {1.1), we have fln ~ 0. Thus, we can interpret Rasa one step
Markov transition matrix for a finite state chain all of whose states are
eventually killed. The probability that the chain is "killed" when the state
is X is 1- Ey r(x,y).
Equation (1.4) has a unique solution and its components are given by
(1.2). To see this, simply iterate (1.4) n times to get
n-1
W=flnw+ :l:flic.
i=O

Now let n ~ oo. The first term on the right side goes to zero and the limit
of the second term is just the vector of costs with components defined by
(1.2).
For an alternative and often preferred method of writing the cost function
in vector form, one eliminates the states in the boundary set as and uses
the reduced state spaceS- as, as follows: Define r(x,y) = p(x,y) for
x,y E B-as and define the transition matrix R = {r(x,y); x,y E S-aS}.
Set W = {W(x),x E S- aS}, and define the vector of cost rates C =
38 2. Controlled Markov Chains

{C(x), X E s- as} by C(x) = c(x) + LyE8S p(x, y)g(y) for X E s- as.


Then we get the IS-aSI-dimensional equation to which (1.2) is the unique
solution, for X E S - aS :

W=RW+C. (1.5)

2.1. 2 Discounted cost


Let f3 > 0, and define the total discounted cost

L e-.Bnc(en)·
00

W(x) =Ex (1.6)


n=O

Here there is no explicit "stopping" or "boundary" set. Later, we will allow


the discount factor f3 to depend on the current state. Following the proce-
dure used for (1.2), we can get a functional equation for the W(x) of {1.6)
by using the Markov property and writing:

W(x) = E, { E,e-• [f,·-•<•-'lc(e.)l6]} +c(x)

= e-.8 Ex W(6) + c(x) (1.7)

e- 13 LP(x, y)W(y) + c(x).


y

Next, let us modify (1.6) by introducing an explicit stopping set as as


in Subsection 2.1.1, and suppose that the accumulation of the running cost
stops at the time N of first entrance into aS, and a boundary or stopping
cost g( ·) is added as in (1.2) above. That is, the total cost is

{1.8)

Then the functional equation for {1.8) is

W(x) = { e-!3 ExW(6) + c(x), xEs -as


(1.9)
g(x), x EaS.

Define the "discounted" and degenerate transition matrix R = {r(x, y), x, y


E S- aS} by r(x,y) = e-!3p(x,y). Define the cost vector C = {C(x),x E
s- as} by C(x) = c(x) + e-!3 LyE8S p(x, y)g(y) for X E s- as. Define
the vector W = {W(x), xES -aS.} Then we can write (1.9) in the vector
form
2.1 Recursive Equations for the Cost 39

W = RW +C. {1.10)

Equation {1.10) has a unique solution whether or not ExN < oo for all
x, due to the fact that the discounting implies that nn---+ 0.

State Dependent Discount Factor. In the Markov chain problems


which arise as the numerical approximations to the discounted cost contin-
uous time control problems of interest in this book, it is often the case that
the discount factor depends on the current state (and possibly on the cur-
rent control action), even if the discount factor in the original continuous
time problem was not state dependent. The costs {1.6) or {1.8) are easily
modified to account for this possibility, and in the rest of this chapter, we
will use a state (or state and control, if appropriate) dependent discount
factor. For f3(x) > 0, let e-f3(x) be the discount factor when the state is x.
The strict positivity of f3(x) will be dropped below for the so-called instan-
taneous reflecting states. The appropriate modification of the cost used in
{1.8) is

W(x) =Ex}; exp [- ~ /3{ei)] c(en) +Ex exp [-}; /3{ei)] g(eN ).
{1.11)
Also, {1.9) and {1.10) continue to hold, but with /3 replaced by /3(x) in the
line that calculates W (x).

A Special Case: A Reflecting Boundary. The term "reflecting bound-


ary" is used in a loose sense, because there is no special geometric structure
assumed on the state space here. But the terminology and formulas will be
useful in later chapters when the Markov chains are obtained as "approxi-
mating" processes for the "physical" continuous time controlled processes.
These will be defined on some set in a Euclidean space, and S will be a "dis-
cretization" of that set. These continuous parameter processes might have
a reflecting boundary, and the so-called "reflecting set" (to be called as+
below) for the chain will be an appropriate "discretization" of the reflect-
ing boundary for the original continuous time problem. For the continuous
time problem, the reflection is assumed to be "instantaneous." (Recall the
model of the Skorokhod Problem in Chapter 1). The transition probability
and cost for the approximating chains will ''imitate" this behavior. This is
the reason that we do not discount the time spent at the so-called reflect-
ing states of the chain. When there is an absorbing or stopping boundary
as, as well as a reflection set as+, it is always assumed that the two are
disjoint. An appropriate modification of the discounted cost function will
be defined in the next two paragraphs.
40 2. Controlled Markov Chains

The Discounted Cost Function if there is a Reflecting Boundary.


A reflecting set as+ c S is any selected set which satisfies

Px{en E as+, all n < oo} = 0, and {3(x) = 0 for all x E as+. {1.12)

The above equation guarantees that the chain cannot get stuck on the
reflecting boundary. Suppose that the cost function W(x) of interest is still
{1.11) but with {3(x) > 0 for XEs- as+. Then the recursive equation for
W(x) is

e-f3(z)ExW(ei)+c(x), xEs-as -as+


W(x) = { g(x), xEas {1.13)
ExW(6) + c(x), xEas+.

Equation {1.13) has a unique solution due to the discounting and {1.12).
Let us put the cost equation {1.13) into vector form. Define the dis-
counted substochastic matrix R = {r{x, y); x, y E S- aS} by

e-f3(z)p(x y) xEs-as-as+
r(x,y)= { ' ' {1.14)
p(x,y), x E as+.

Define the cost rate vector C by

c(x) + e-f3{z) 2: p(x,y)g(y), xEs-as -as+


C(x) = { yeas
c(x) + L p(x, y)g(y), x E as+.
yeas
(1.15)
Then (1.10) holds with these new definitions. Because Rn ~ 0, the solution
to (1.10) is still unique.

2.1. 3 Average cost per unit time


Suppose that the state space S is a single connected aperiodic class: That
is, for large n, then-step transition probabilities satisfy pn(x, y) > 0 for all
x,y. Then there is a unique invariant measure 71' = {11'(x),x E S} which we
write as a row vector, and which satisfies the equation 71' = 71'P {84]. Define
the "stationary" cost value 'Y and "stationary expectation" operator E1r by

'Y =E1fc(e) = L 11'(x)c(x).


zeS
(1.16)

By the ergodic theorem for Markov chains [23, 84], for each x
2.1 Recursive Equations for the Cost 41

There is an auxiliary function W (·) such that the pair (W (·),-y) satisfies
the equation [11, 88, 126]

W(x) + 'Y = E:~:W(6) + c(x). (1.17)

In vector form, (1.17) is (W is an lSI-dimensional vector here)

W +e'Y = PW +C, (1.18)

where e = (1, ... , 1) is the column vector all of whose components are
unity, P = {p(x, y); x, yES}, and C = {c(x); xES}. On the other hand,
suppose that (1.18) holds for some pair (W, 'Y)· Premultiply each side of
(1.18) by 1r and use the fact that 1r = 1rP to get (1.16). One choice for W
is
L pi(C- e-y),
00

i=O

which is well defined because pic ---t e-y at a geometric rate under our
conditions.
An alternative way of showing that the 'Yin (1.17) equals the value in
(1.16) involves iterating (1.17) n times to get
n-1
W = pnw + L pi(C- e-y).
i=O

Now divide by nand let n ---too to get (1.16) again. The function W(-) in
(1.18) is not unique. If any vector of the form ke is added toW(·), where
k is a real number, then (1.18) still holds for the new value. We will return
to a discussion of the ergodic cost problem for Markov chains in Chapter
7, where additional conditions for the existence of solutions to (1.17) are
given.

2.1.4 Stopping at a given terminal time


Consider the case where the interest in the chain stops at either a given
nonrandom time M or at the first time N that the chain enters a selected
stopping set 8S c S, whichever comes first. Let E:~:,n denote expectation of
functionals of the chain {~i, i ~ ri}, given that ~n = x. For given functions
c(·) and g(·), let the cost starting at time n:::; M be defined by

(1.19)

Using the Markov property, as done in (1.3) or (1.7), for n < M and
42 2. Controlled Markov Chains

l}
XES- aS we can write (1.19) as

W(x, n) = c(x) + Ex,n { Ex,n [ . L


(NAM)-1
c(~i) + g(~NAM• N 1\ M) ~n+l
t=n+l

= Ex,nW(~n+l,n+1)+c(x),
{1.20)
where we use ~n+l to denote the state at absolute time n + 1. Note that
we are not using the "shifting initial time" terminology which is frequently
used (e.g., as in [49]) when working with Markov chains, but rather an
absolute time scale. The boundary conditions are

W(x,n) = g(x,n), X E as or n = M. (1.21)

Define W(n) = {W(x, n), x E S- aS}. Then in vector form, (1.20) is

W(n) = RW(n+ 1) +C, n < M, (1.22)

where R and C are the same as above (1.5) and the terminal boundary
condition is W(M) = {g(x,M),x E S- aS}.

2.2 Optimal Stopping Problems


One of the simplest control problems is the optimal stopping problem,
where the only two possible control actions at any time are to stop the
process or to let it continue (if it has not yet been stopped). As in Section
2.1, let {~n,n < oo} be a Markov chain on a finite state spaceS with
time independent transition probabilities p(x, y). The cost of control is the
expectation of some function of the path followed until the stopping time.
A quite complete development under general conditions appears in [137].
Examples occur in the theory of hypothesis testing, where one wishes to
decide which one of several alternative hypotheses is the true one on the
basis of a sequence of sample observations. At the end of the sampling, a
decision is made concerning which hypothesis is the true one. Under each
given hypothesis, it is assumed that the components of the sequence of
samples are mutually independent. The basic question concerns the num-
ber of samples to take, and the decision to continue sampling will depend
on the values of the already available samples. There is a cost assigned
to each sample taken, and also a cost associated with an incorrect final
decision concerning which of the two hypothesis is the true one. For this
example, after each new sample is taken, the conditional probability that
each alternative hypothesis is the true one is recomputed, and a decision
is made on whether to continue sampling or not. Sampling continues until
the additional information obtained concerning the true hypothesis is not
worth the additional cost of sampling. Other examples of optimal stopping
2.2 Optimal Stopping Problems 43

problems occur in the timing of the buying or selling of assets and in the
theory of reliability and maintenance, and additional examples are given in
the references on stochastic control.

Definition. Let N denote a random variable with values in the set [0, oo].
N will be the time at which the chain is stopped. If the chain is never
stopped for some sample path, then N = oo on that path. In the sequential
sampling problem above, the value N = oo would occur if sampling never
stopped. We say that N is an admissible stopping time or simply a stopping
time for the chain {en, n < oo} if it "does not depend the future." More
precisely, N is an admissible stopping time if for any n, m > n, and function
F(·), we have

Loosely speaking, for any n, the event {N = n} might depend only on the
past and current values of the state {ei,
i :::; n }, or it might depend on other
quantities as well, as long as the above "Markov property" is preserved.

2. 2.1 Discounted cost


A natural extension of the discounted cost function (1.8) can be associated
with an admissible stopping time. In particular, let {3 > 0 and define

W(x, N) = Ex [}; e-.Bnc(€n) + e-.BN g(€N)] , (2.1)

for some given functions c(·) and g(·). In Subsection 2.1.2, N was the first
entrance time into the a priori selected set as. If the state dependent
discount factor {3(x) > 0 is used, then replace (2.1) by

W(x, N) = Ex}; exp [- ~ f3(ei)] c(en) +Ex exp [-};. f3(ei)] g(eN ).
(2.2)
Before writing the functional equation for the optimal cost, let us con-
sider the special case where N is a stopping time which is a priori defined to
be the first entrance time of the chain into a selected set and use the Markov
property to get the functional equation for the associated cost. In partic-
ular, suppose that there is a set So c S such that N = min{n: en E So}.
Then the functional equation (1.9) holds for the cost (2.2). In particular,
xES-So
(2.3)
x E So.
44 2. Controlled Markov Chains

We use both notations as and So because as will be a set in the state


space which will approximate a "boundary," whereas S0 is an unknown
set, which is to be determined by solving the optimal stopping problem.

A Recursive Equation for the Optimal Cost. Define the infimum of


the costs
V(x) = infW{x,N),
N

where the infimum is over all the admissible stopping times for the chain.
Owing to the discounting, V(x) is bounded. We now describe the funda-
mental procedure for obtaining a functional equation for V(x). Suppose
that the current time is n, the current state is en = x, and that the process
has not yet been stopped. We need to decide whether to stop (and attain
an immediate and final cost g(x)) or to continue. If we allow the process
to continue, whether or not that decision is optimal, then there is an im-
mediately realized cost c(x) as well as costs to be realized in the future. If
we allow the process to continue, then the next state is en+l· Suppose that
we continue, but follow an optimal decision policy from the next time on.
Then, by the definition of V(x), the optimal discounted cost as seen from
the next time is V(en+l)· Its mean value, discounted to the present, is

e-.B(x) E[V(en+t)len = x] = e-.B(x) L V(y)p(x, y) = e-.B(x) Ex V(et).


!I

Thus, if we continue at the current time and then act optimally from the
next time on, the total discounted cost, as seen from the present time, is
e-.8(x)ExV(et) +c(x).
We still need to choose the decision to be taken at the present time. The
optimal decision to be taken now is the decision which attains the minimum
of the costs over the two possibilities, which are: {1) stop now; {2) continue
and then act optimally in the future. Hence, the optimal cost must satisfy
the equation

V(x) = min[g(x), e-.8(x) Ex V(et) + c(x)]. {2.4)

If the two terms in the bracket in {2.4) are equal at some x, then at that x
it does not matter whether we stop or continue. The term in (2.4) which is
the minimum tells us what the optimal action is. The set So= {x: V(x) =
g(x)} is known as the stopping set. For x fj. S0 , {2.4) implies that we should
continue; otherwise, we should stop.

The Principle of Optimality. The method just used for the derivation
of {2.4) is known as the principle of optimality. It is the usual method
for getting the functional equations which are satisfied by optimal value
functions for control problems for Markov process models when the control
at any time can depend on the state of the process at that time. It will also
2.2 Optimal Stopping Problems 45

be used in a formal way in Chapter 3 to get partial differential equations


which are formally satisfied by the value functions for the continuous time
problems. Let the current state be x. The principle of optimality basically
asserts that whatever we do now at state x, in order to get the optimal
value function (least cost), we will have to use an optimal policy from the
next step on, from whatever position the system finds itself in at that time.
The distribution of the next value of the state of the chain depends on the
current control action and state. The total expected cost from the present
time on, given the current state and control action, is just the cost attained
at the present time plus the future expected cost (with discounting used,
if appropriate). Then, the current value of the control must be selected to
minimize the sum of these costs. Other examples of the use of the principle
of optimality appear in the next section.

Rewriting (2.4) in Terms of a Control. Equation (2.4) is known as


the dynamic progmmming equation or Bellman equation. For the discounted
cost problem of this subsection, it has a unique solution as will be shown
below (2.6). It is easier to see this if we rewrite the equation in terms of a
controlled transition function and then put it into vector form. The control
here is trivial, but the notation allows us to get a convenient expression,
and to set the stage for the more complicated problems of the next section.
Define the control space U = {0, 1}, with generic value a. The control value
a = 0 is used to denote that we continue the process, and the control value
a = 1 is used to denote that we stop the process. Define the controlled
transition function p(x, yia) and cost rate function c(x, a) by

p(x, yia) = { ~~x, y), a=O


a= 1,
c(x), a=O
c(x, a) = { g(x), a=l.
If there is aU-valued function u(·) on S such that the transition prob-
abilities are p(x, yiu(x)), then the control is said to be pure Markov. Such
a control is a feedback control which does not depend on time. For such a
control, the stopping set is So= {x: u(x) = 1}. Let Un denote the decision
at time n. Then Un = 0, for n < N, and UN = 1. If there are numbers Px
such that for all n

P{un = 1l~i,i ~ n, ui = O,i < n} = Pe...


where Px E (0, 1) for some x, then the stopping rule or control is said to be
mndomized Markov. If the control or (equivalently) the stopping decision
is determined by a pure Markov control or decision function u(·), then we
write the associated cost as W(x, u) instead of as W(x, N).
We next write the expression for the minimum cost in vector notation.
For a feedback control u(·), define the cost rate vector C(u) = {c(x,u(x)),
46 2. Controlled Markov Chains

x E S}, the controlled discounted transition function

r(x, ylu(x)) = e-.B(x)p(x, ylu(x)),

and the degenerate transition matrix R(u) = {r(x,ylu(x)); x,y E S}. Let
W(u) = {W(x, u}, x E S} denote the vector of total discounted costs
(2.2} under the feedback control u(·}, and let V = {V(x}, x E S} denote
the vector of least costs. Then we can write (2.4} in the following lSI-
dimensional vector form:

V = min [R(u)V + C(u)]. (2.5}


u(x)EU

In (2.5} and in future minimizations of vector valued functions, the min-


imum is taken line by line. That is, in the xth line of (2.5}, we minimize
over all the values of u(x) E U.

Existence of a Solution to (2.5) and an Optimal Feedback Control.


Let u( ·) denote some particular control at which the minimum in (2.5} is
taken on. Thus we are assuming the existence of the pair (V, u). We will
show that it is an optimal feedback control. Letting u(·) be any other
feedback control, the minimizing operation in (2.5} yields

V = R(u)V + C(u)::; R(u)V + C(u}, (2.6}


where the inequality is component by component. Iterating (2.6} and using
the fact that (due to the discounting)

(2.7}

we have

L Rn(u)C(u) = W(u)::; L Rn(u)C(u) = W(u},


00 00

V= (2.8}
n=O n=O

where the inequality is component by component. Thus, V is indeed the


minimal (over all feedback controls} cost vector, u(·) is an optimal decision
or control function, and the solution to (2.5} is unique because the minimal
cost is unique. Via a similar argument, it can be seen that for any Vo, the
iteration
Vn+l = min [R(u)Vn + C(u)]
u(x}EU

converges to a solution to (2.5}. The initial condition is irrelevant due to


the discounting.

A Note on the "Contraction" (2.7). Note the critical role that (2.7}
played. If the n-step discounted transition probabilities did not go to zero,
2.2 Optimal Stopping Problems 47

we could not have obtained the uniqueness and possibly not even the in-
terpretation of the solution to (2.5) as the minimum cost vector. A similar
property is needed for the general control problem with a Markov chain
model and will be dealt with in Section 2.4 in more detail. Such properties
are also very useful in dealing with the convergence proofs for numerical
methods for solving equations such as (2.5). A proof of convergence in
a setting where the contraction property is absent is given in Subsection
15.3.3.

Including an Obligatory Stopping Set. There is a variation of the


optimal stopping problem where, irrespective of wherever else we might
decide to stop, we must stop on first reaching some a priori selected set
aS E S, and where we attain a cost of g(x) if we first enter as at the
point x. Then (2.4) still holds for X E S- aS, but we have the boundary
condition V(x) = g(x), X E as.

Remark. The above discussion showed only that the minimizing control in
(2.4) or (2.5) is the optimal control/stopping decision function with respect
to the class of comparison stopping rules which are also determined by first
entrance times into sets in the state space or, equivalently, by feedback
control laws, and that V(x) is the least cost only in this class. But it is
also true that the optimality of u(-) in this class implies optimality with
respect to all admissible stopping times. The proof is essentially the same
as used above for the pure Markov rules, except that the alternative u(~n)
are replaced by a sequence of appropriate "admissible" decision variables
{un} and the details are omitted [88, 138].

2.2.2 Undiscounted cost


In the absence of discounting [i.e., /3(x) = 0], (2.2) need not have a well de-
fined meaning without some additional conditions. The essential condition
used in the arguments of the last subsection was (2.7), where R(u) was the
"effective" transition matrix for the controlled chain. The condition (2. 7)
held due to the discounting, irrespective of the values of p(x, y) or of the
choice of control function.

Alternative Conditions. Define eo= minx c(x), and suppose that eo> 0.
Then we need only consider stopping times N which satisfy

ExN:::; 2sup jg(y)l. (2.9)


Co
To see this, suppose that N is an admissible stopping time which violates
(2.9). Then eoExN > 2 supy jg(y)l and the associated cost satisfies

W(x,N) ~ eoExN + Exg(~N) > 2sup jg(y)j + Exg(~N) > g(x).


48 2. Controlled Markov Chains

Thus, it would have been preferable to stop at the initial time rather than
at N. Hence, we can assume (2.9). This implies that V(x) is finite. Also,
Rn(u) ~ 0 for all pure Markov u(·) for which (2.9) holds. By this result
and the principle of optimality, V ( ·) satisfies

V(x) = min[g(x), E:c V(6) + c(x)]. (2.10)

An obligatory stopping set can also be added and the comments made in
the paragraph just above Subsection 2.2.2 hold for this case too.

2.3 Discounted Cost


Terminology. In this section, we treat the general discounted cost prob-
lem for a Markov chain model. The principle of optimality introduced in
the last section will be used to obtain a functional equation for the minimal
cost. Let U, the control action set, be a compact set in some topological
space, with generic variable a. The actual space is unimportant, although
in applications it is generally either a finite set or a subspace of a Eu-
clidean space. We say that {~n,n < oo} is a controlled Markov chain on
the finite state space S if the transition probabilities are functions of a con-
trol variable which takes values in the set U. The "controlled" transition
probabilities will be written as p(x, yia), where a will usually depend on x.
For reasons of notational simplicity, the controlled transition probabilities
will not depend explicitly on time. At any time n, the control action is a
random variable, which we denote by Un. Let u = (uo, u1. ... ) denote the
sequence of U-valued random variables which are the control actions at
times 0, 1, .... We say that u is admissible if the Markov property continues
to hold under use of the sequence u, namely, that

Let E:: denote the expectation of functionals of 6 given ~o = x, uo = a.


For an admissible control sequence u, let E'!: denote the expectation of
functionals of the chain {~n, n < oo} given that the initial condition is x
and which uses the transition probabilities p(z, yiun) at time n if ~n = z.
There is an abuse of notation concerning the use of u which is convenient
and which should not cause any confusion. If there is a function u( ·) such
that Un = u(~n), then we refer to the control as a feedback or pure Markov
policy and use the notation u to refer to both the function and the sequence
of actual control actions.
In the last section, we allowed the discount factor to depend on the state.
Throughout the rest of the chapter, we allow it to depend on both the state
and control, and suppose that /3( ·) is a nonnegative function of x and a and
is continuous in a for each x. We will also suppose that the cost function
c(x, ·) and the transition probability p(x, Yi·) are continuous functions of
2.3 Discounted Cost 49

the control parameter a E U for each x, y. Now, suppose that f3(x, a) > 0
for each x, a. For an admissible control sequence u, define the cost

W(x, u) ~ E; ~ exp [- ~ /3(€;, 0;)] c(€., u.). {3.1)

The modifications which are needed if we are obliged to stop when first
reaching a selected "boundary" set as will be stated below.
Let V(x) denote the infimum of the costs W(x, u) over all admissible
control sequences. V(x) is finite for each x due to the discounting and
satisfies the dynamic programming equation

{3.2)

Equation {3.2) is easy to derive via use of the principle of optimality, as


follows. Given the current state x, we use a control action a. This value of
a must be chosen in an optimal way. However, for any choice of a, there
is a current "running" cost of c(x,a). Suppose that from the next step
on, we choose the control value in an optimal manner. Then whatever the
next state and current control action, the expected value (as seen from the
present state and time) of the discounted future cost is e-.B(z,ll<)E~V(el)·
The total discounted cost as seen from the present time, and given that the
current control action is a, is c(x, a) + e-.8(z,ll<) E~V(6). Now we need to
choose the minimizing value of a, from which {3.2) follows.
There is a function u(x) which attains the minimum in {3.2). Reason-
ing as in the paragraph below {2.4) shows that the solution to {3.2) is
unique and that u( ·) is indeed an optimal control with respect to all ad-
missible control sequences. We will repeat some of the details. The argu-
ment to follow is essentially a verification of the principle of optimality.
Let u( ·) be any feedback control. Define the discounted transition ma-
trix R(u) = {e-.B(z,u(z))p(x,yiu(x)); x,y E S}. Define the cost rate vector
C(u) = { c(x, u(x)), x E S}, and define the cost vector W(u) and minimum
cost vector Vas done in connection with {2.4). Then the vector version of
(3.2) is
V = min [R(u)V + C(u)]. {3.3)
u(z)EU

Then, as in Section 2.2, (3.3) implies {where the inequality is for each
component)
V = R(u)V + C(u) ~ R(u)V + C(u), {3.4)
where u(x) is the minimizing control in {3.2) or in the xth line of {3.3) and
u( ·) is any other feedback control. As in the last section, the discounting
implies {2.7) and that {2.8) holds, from which follows both the uniqueness
of the solution to (3.3) and the fact that the solution is the minimum value
function over all feedback control alternatives. A similar proof shows that
u( ·) is optimal with respect to all admissible control sequences.
50 2. Controlled Markov Chains

A Stopping Set. Suppose that there is a chosen "boundary" set as such


that we must stop on first contact with it, and with stopping cost g(x)
if first contact is at x. Let N be the first time of contact. Then (3.1) is
modified as:

W(x,u)
(3.5)

Then the dynamic programming equation is (3.2) for X fl. as, and with the
boundary condition V(x) = g(x),x E as. All of the previous results hold
here also, and a "reflecting boundary" as+ can be added as in Section 2.1.

2.4 Control to a Target Set and Contraction


Mappings
If {J(x, a) > 0, then the discount factor e-.B(x,a) guarantees that (3.2) and
(3.3) have a unique solution which is the minimum value function. In the
absence of discounting, ope needs to be more careful. Property (2.7) is
required for the candidate u(·) for the optimal feedback control and for
appropriate comparison controls u(·), and some condition which guarantees
this is needed. We will describe the general requirement. Let {~n, n < oo}
be a controlled Markov chain as in the last section. Let as E S be a selected
stopping or boundary set, so that we are obliged to stop the control process
at time N, the moment of first entrance into aS. For an admissible control
sequence u, define the undiscounted cost
N-1
W(x,u) = E; L c(en,un) +E;g(~N)· (4.1)
n=O
Let us define the optimal cost V(x) = inf,.. W(x, u), where the infimum is
over all admissible control sequences. In the absence of discounting, neither
the function W(x,u) in (4.1) nor the V(x) need be well defined or finite.
Let us proceed formally for the moment so that we can see what is required.
By a formal use of the principle of optimality, the dynamic programming
equation for the cost function (4.1) is

inf [E~V(6)+c(x,a)], xEs- as


V(x) = { aEU (4.2)
g(x), xE as.

Next, working as in the previous sections, we put (4.2) into vector form.
For a feedback control u(·), define the reduced transition matrix R(u) =
2.4 Control to a Target Set and Contraction Mappings 51

{r(x,ylu(x)), x,y E 8- 88}, where


r(x, yla) = p(x, yla), x E 8- 88. (4.3)
Define the cost rate vector C(u) with components

C(x, u) = c(x, u(x)) + L p(x, ylu(x))g(y), x E 8- 88.


yE8S

Then, defining V = {V(x),x E 8- 88}, we can write (4.2) as


V= min [R(u)V+C(u)]. (4.4)
u(x)EU

In (4.4), the minimization is over each component separately, as usual.

Definition. Let R be a square matrix. We say that R is a contraction if


Rn~o.

Note that the R(u) which appeared in (2.5) and (3.3) are contractions for
all feedback u(·). If R(u) is a contraction for each feedback control u(·),
then the arguments of the previous two sections can be repeated to show
that the solution to (4.2) or (4.4) is unique, that it is the optimal cost, and
that the minimizing feedback control is an optimal control with respect to
all admissible controls.

A Sufficient Condition for a Contraction, and Discussion. In the


applications of interest in this book, it is often the case that R( u) is a
contraction for each feedback control u(·). A useful and frequently used
alternative condition is the following:

A4.1. There are eo > 0, such that c(x, a) ~ eo, and a feedback control uo(-)
such that R(Uo) is a contraction.

In this case, the positivity of the running cost rate c(·) implies that W(x, u)
might be unbounded for some controls. But the existence of uo(·) in (A4.1)
guarantees that there is a feedback control u(·) which is optimal with re-
spect to all admissible controls and such that R(u) is a contraction and that
(4.4) holds. The proofs of such assertions can be found in many places in the
literature on control problems with Markov chain models [11, 88, 126, 132].
It is important to recall that R( u), being a contraction, is equivalent
to E; N < oo for all x because the state space is finite, where N is de-
fined above (4.1). This later interpretation is useful because it can often
be checked by inspection of the transition matrices, for particular controls
u(·). In particular, for each x fj. 88 there needs to be a chain of states of
positive probability under u( ·) and which leads to some state in 88.
The concept of contraction plays a fundamental role in the proofs of
convergence of the numerical methods used in Chapter 6.
52 2. Controlled Markov Chains

2.5 Finite Time Control Problems


The controlled analogue of the cost function (1.19) in Subsection 2.1.4 is

(5.1)

where we use the notation E;:,n to denote the expectation given that en = X
and that the control sequence {un, .. .} is used. As in Subsection 2.1.4,
N = min{n: en E as}. Letting V(x,n) denote the infimum of W(x,n,u)
over all admissible controls and using the principle of optimality, we get
that the dynamic programming equation is

V(x, n) =min [E~ nV(en+b n + 1) + c(x, a)], (5.2)


o.EU '

for xES -as, n < M. The boundary conditions are the same as in (1.21),
namely, W(x, n) = g(x, n) for X E as or n = M.
3
Dynamic Programming Equations

In this chapter we define many of the standard control problems whose


numerical solutions will concern us in the subsequent chapters. Other, less
familiar control problems will be discussed separately in later chapters.
We will first define cost functionals for uncontrolled processes, and then
formally discuss the partial differential equations which they satisfy. Then
the cost functionals for the controlled problems will be stated and the
partial differential equations for the optimal cost formally derived. These
partial differential equations are generally known as Bellman equations or
dynamic programming equations. The main tool in the derivations is Ito's
formula.
It should be noted that not only are our derivations of the equations for-
mal, but, in general, the equations themselves have only a formal meaning,
and little is known concerning existence, uniqueness, etc. (An exception to
this is the so-called ''viscosity solution" method of Chapter 16.) One of the
reasons we present the equations is because their forms are very suggestive
of good numerical methods. However, when we deal with the conv«J'gence
of the numerical methods, our aim will be to show that the value functions
to which our approximations converge are optimal value functions, and we
do this by identifying an optimal controlled process with that value func-
tion. Thus, the Bellman equations never enter into the convergence proof,
and all the analysis is carried out in terms of the cost functionals. It is not
necessary that the reader be familiar with the POE's to understand or use
the algorithms or to understand the convergence proofs.
As remarked above, our interest in the formally derived PDE, boundary
conditions, and so on, is mainly due to the fact that they suggest useful
54 3. Dynamic Programming Equations

numerical schemes. With this in mind it makes little sense to maintain any
pretense regarding mathematical rigor in these derivations. The ''valida-
tion" for any derived equation will come in the form of a rigorous conver-
gence proof for numerical schemes suggested by this equation. Thus, the
formal derivations themselves are not used in any direct way. Our motiva-
tion for including them is to provide a guide to similar formal derivations
that might be useful for less standard or more novel stochastic control
problems. For a more rigorous development of the dynamic programming
equations we refer the reader to [56] and [58].

3.1 Functionals of Uneontrolled Processes


In this section we will work with the uncontrolled processes
dx = b(x)dt + a(x)dw, (1.1)
dx = b(x)dt + a(x)dw + dJ, (1.2)
where the jump term J has jump rate A(x) and jump distribution fi(x, dy).
Processes (1.1) and (1.2) were discussed in Sections 1.3 and 1.5, respec-
tively. For the finite time problem, it is often of interest to introduce a time
variable explicitly and rewrite (1.1) as
dx = b(x, t)dt + a(x, t)dw (1.3)
and with a corresponding change in (1.2). Time variations can be included
for all the cost functionals, except for the average cost per unit time case.
The reader should have no difficulty in filling in the details, which we leave
out in order to keep the development within reasonable bounds. The func-
tions k(·), g(·), b(·), a(·) are all assumed to be bounded and continuous. To
begin, we consider several examples (Subsections 3.1.1 to 3.1.5) involving
diffusions with no jump term.

3.1.1 Cost until a target set is reached


Let G be a compact set with a smooth boundary oG. Let CO denote the
interior of G, and suppose that G is the closure of its interior. Define the
stopping timer by r = min{t: x(t) ¢ G0 }. As is customary, we make the
convention that if a stopping time is not defined, then its value is set to
infinity. Thus, if x(t) E G0 for all t < oo, then r = oo. Suppose that for
all x E G we have Exr < oo. [A discussion of elementary conditions on
the process components that are sufficient to ensure this property can be
found in [83, Lemma 5.7.2]. For example, it is sufficient to assume there is
i such that aii(x) > 0 for all x.] Define the cost functional

W(x) =Ex [for k(x(s))ds + g(x(r))].


3.1 Functionals of Uncontrolled Processes 55

Because the process is stopped upon hitting 8G, we refer to the associated
boundary condition as absorbing. From a formal point of view, W ( ·) satisfies
the equation
.CW(x) + k(x) = 0, x E G0 , {1.4)
where .Cis the differential operator of {1.1). A formal derivation of {1.4) is
as follows. Under broad conditions we have Px{r ~ A}j.A -t 0 as A-t 0
for X E G0 . Suppose that W(·) is bounded and in C 2 (00). For A> 0 we
may write

W(x) = E. { J.Mr k(x(s))ds + g(x(r)) + [J: k(x(s))ds] I{r>")}

E, { J."AT k(x(s))ds + g(x(r))I{r<"l + W(x(ll.))I{r>")},

where the second equality follows from the Markov property and the defi-
nition of W(·). It follows that

where
h(r,A) = W(x(A))- ft.. k(x(s))ds- g(x(r))
frAil
is bounded uniformly in w and A. Therefore, under the condition Px{ T ~
A} j A -t 0 as A -t 0, the right hand side of the last equation tends to zero
as~--+ 0. Applying Ito's formula to the left hand side of {1.5) and sending
A -t 0, we formally obtain (1.4).
The boundary conditions are not so obvious, since not all the points on
the boundary are necessarily reachable by the process. Only a few com-
ments will be made on this point. We define a regular point of the 8G to
be any point x E 8G such that, for all 0 > 0,

lim Py{T > o} = 0.


y-tx,yEG 0

Thus, a regular point is a point such that if the process starts nearby, then
exit is virtually assured in an arbitrarily small time and in an arbitrarily
small neighborhood. The reader is referred to [83] for further discussion
of the distinction between regular points and those points which are not

I:
regular.
Now suppose that X E aG is regular. Then Ey k(x(s))ds --+ 0 as
y -t x. Furthermore, the continuity of g implies that Eyg(x(r)) -t g(x)
as y--+ x. Combining, we have W(y) --+ g(x) as y -t x. Thus, the correct
boundary condition is W(x) = g(x) for regular points x of 8G.
56 3. Dynamic Programming Equations

The Verification Theorem. Suppose that {1.4) holds for W(·) bounded
and smooth inside G, and that the probability that x(·) exits G 0 through
the set of regular points is unity for each initial condition x E G0 . Then by
Ito's formula, for t < oo we have

W(x(t 1\ r)) = W(x) + 1tAT .CW(x(s))ds + 1tAT W~(x(s))a(x(s))dw(s),


which by (1.4) implies

ExW(x(t 1\ r)) = W(x)- Ex Jo


tAT k(x(s))ds.
~ oo, and recall our assumption ExT < oo. Then W(x(t 1\ r)) ~
Now let t
W(x(r)) = g(x(r)) and J~AT k(x(s))ds ~ J; k(x(s))ds. By rearranging
terms and applying the dominated convergence theorem, we obtain

W(x) =Ex [1r k(x(s))ds + g(x(r))],

which proves that W(x) is the cost.

3.1.2 The discounted cost


For f3 > 0, define the discounted cost function with no boundary conditions:

W(x) =Ex 1 00
e-.Bsk(x(s))ds. (1.6)

If W(x) is sufficiently smooth, then one can easily show it satisfies the
equation
.CW(x)- {3W(x) + k(x) = 0, x E IRk. (1.7)
The idea is essentially as follows. Following the argument of the previous
subsection, for ~ > 0 we write

W(x) = Ex 1/l. e-.Btk(x(t))dt +Ex ioo e-.Btk(x(t))dt


Ex 1/l. e-.Btk(x(t))dt

+Exe-.Btl. { Ex(tl.) [ioo e-.B(t-tl.)k(x(t))dt]}


= Ex 1/l. e-f3tk(x(t))dt + Exe-f3tl.W(x(~)).

From this we get

! E. [.-•"w(x(Ll.)) - W(x) + J." ·-"'k(x(t))dt] ~ 0,


3.1 Functionals of Uncontrolled Processes 57

or

E W(x(Ll))- W(x) e-f3t:.- 1 E W( (Ll))


0 X Ll + Ll x X
1 rt;,.
+Ex Ll Jo e-f3tk(x(t))dt.

Using Ito's formula and then sending Ll -+ 0 formally yields (1.7). An


argument analogous to that of the Verification Theorem of Subsection 3.1.1
shows that if W(·) is in C 2(JRk) and is bounded and (1.7) holds, then it is
actually the cost (1.6).
A target set G can be introduced as in the case of Subsection 3.1.1.
Defining G and T as in that subsection and using

W(x) =Ex for e-f3sk(x(s))ds + Exe-{3-r g(x(r)), (1.8)

we have formally that W (·) satisfies ( 1. 7) for X E G 0 , While W (X) = g( X)


holds at regular points of the boundary.

3.1. 3 A reflecting boundary


We now consider the problem where the process is reflected rather than
absorbed on hitting the boundary of the set G. A special case will be dealt
with and even that will be treated in a casual manner. Suppose that the
boundary aa is smooth; in particular' that it is continuously differentiable.
Let n(x) denote the outward normal to the boundary at the point x. The
reflection direction is denoted by r(x), where r: JRk-+ JRk is a continuous
function. The reflection will be "instantaneous" and will be strictly inward
in the sense that there is a 8 > 0 such that n(x)'r(x) ::; -8 on aa.Our pro-
cess model is the stochastic differential equation with reflection of Section
1.4 with the domain and boundary data as above. See Example 1.4.3.
Any of the cost functionals used above can be adapted to the reflected
process. If the cost functional is (1.6), then the equation satisfied by the
cost functional is (1.7) with the boundary condition

Wx(x)'r(x) = 0 (1.9)

on an appropriate subset of aa.


We can give the following intuitive interpretation of the boundary condi-
tion. Using the Skorokhod Problem formulation of the reflected process, it
is possible to show that for the purpose of approximately computing (1.6)
the reflected processes can be replaced by a process constructed as follows.
Since the reflection is to be "instantaneous," we can consider a process x(·)
such that upon hitting the boundary at the point x, x(·) is instantly pro-
jected a distance r(x)8, where 8 > 0 is small. Thus, the cost functional at
58 3. Dynamic Programming Equations

x and that at x+r(x)8 are the same for the approximation to the reflected
process, which suggests (1.9).
Part of the boundary can be absorbing and part reflecting. In order to
contain the discussion, we describe only the special case where there is an
outer boundary that is reflecting and an inner boundary that is absorbing.
Let G1 and G be compact sets, each the closure of its interior, and with
G1 C G0 • The process will be reflected instantaneously to the interior of G
on hitting its boundary, and it will be absorbed on hitting the boundary
of G1. Define T = inf{t : x(t) E Gl}, and consider the cost functional
(1.8). Then W(·) formally satisfies (1.7) but with the boundary condition
W(x) = g(x) on the boundary of G1 and Wx(x)'r(x) = 0 on the boundary
of G.

3.1.4 The average cost per unit time


When systems operate over long time periods, an appropriate cost func-
tional is the average cost per unit time. As in the previous subsections, we
work formally. Suppose that the limit

. Ex J~ k(x(s))ds
'Y = l lm (1.10)
t-too t

exists. One can formally derive a PDE which yields/, but we only discuss
a verification theorem. Suppose that there is a smooth function W ( ·) and
a constant 'Y which satisfy the equation

.CW(x) = 'Y- k(x). (1.11)

Using relation (1.11) and Ito's formula yields

ExW(x(t))- W(x) =Ex lot [r- k(x(s))] ds.


If
ExW(x(t)) 0
t --+ '
then 'Y satisfies (1.10).
For numerical purposes it is generally necessary to work with compact
state spaces, and if the problem is not given a priori on a compact set, then
the state space must be "compactified" just so that a numerical solution
can be obtained. A typical way of doing this is to introduce a reflecting
boundary. There are many methods that are appropriate and one can even
have more complex boundary processes (such as "sticky" boundaries, or
movement along the boundary). One tries to choose a boundary condition
such that the essential features of the original problem are preserved, and
the modified problem can be solved or well approximated numerically. For
3.1 Functionals of Uncontrolled Processes 59

the ergodic cost problem, we can introduce a reflecting boundary G as in


the last subsection. Then (1.11) holds inside G and the boundary condition
(1.9) holds on ac.

3.1. 5 The cost over a fixed finite time interval


Here the process is of interest over the finite interval [0, T] only, and the
process stops on hitting the boundary of the region G. Let g(·) be a con-
tinuous and bounded function on JRk x [0, T]. The process will stop if it
hits the boundary of G before time T. If this occurs at time 8 < T and
the exit point is x, then the penalty will be g( x, 8). If the process does not
exit G before T, then the cost will be g(x(T), T). It is, therefore, natural
to set the problem up with a terminal value imposed at timet = T. The
problem may be rewritten as an initial value problem if desired. Then the
cost, starting at point x E G 0 at timet ::; T, is

From a formal point of view, it can be shown that W(·) satisfies the PDE
8W(x t)
8t' + .CW(x, t) + k(x) = 0
for x E G 0 , t < T, together with W(x, T) = g(x, T). We also have W(y, t)--+
g(x, t) as y--+ X E oG for regular points X and

Ex,t W(x( T 1\ T), T 1\ T) = Ex,tg(x(T 1\ T), T 1\ T).


A reflecting boundary and discounting can be readily added.

3.1. 6 A jump diffusion example


We consider the setup of Subsection 3.1.1 for the case of a diffusion with
jumps. Recall that G is assumed to be a compact set with smooth boundary
and the property that G is the closure of its interior, and that T = inf {t :
x(t) E JRk\G 0 }. Let x(·) satisfy (1.2), with the specified jump rate and
distribution. Once again, define the cost by

W(x) =Ex [1r k(x(8))d8 + g(x(r))].


We can then formally derive (exactly as in Subsection 3.1.1) the equation

.CW(x) + k(x) = 0, x E G 0 ,

where .C is defined by (1.5.5).


60 3. Dynamic Programming Equations

For the jump diffusion, we obviously need to specify more than simply
a boundary condition because since at the time of exit the process may
be far removed from the set G. The condition becomes W(x) = g(x) for
x¢00.

3.2 The Optimal Stopping Problem


Perhaps the simplest stochastic control problem is the optimal stopping
problem, where the only action to be taken is to decide when to stop
the process. The model is the diffusion (1.1} or jump diffusion (1.2}, and
Ft denotes the underlying filtration. We would like to consider rules for
stopping that are suitably nonanticipative. For later use as well as for use
here, we give the following definition.

Definition 2.1. We say that a nonnegative random variable T is an ad-


missible stopping time if it is a :Fi-stopping time. The stopping time is said
to be pure Markov if there is a Borel B set in the state space such that
r = inf{t: x(t) E B}.

One can show that the pure Markov stopping times are admissible. (Ac-
tually, for the pure Markov stopping times to be admissible for all Borel
sets B requires an additional technical assumption on the filtration, namely,
right continuity [52]. In keeping with the spirit of this chapter we will not
worry about this issue.) Let k( ·) and g( ·) be bounded and continuous real
valued functions, with inf k(x) ~ ko > 0. For an admissible stopping time,
define the cost

W(x, r) =Ex [1 . k(x(t))dt + g(x(r))] (2.1)

and the optimal cost, where the infimum is taken over all admissible stop-
ping times:
V(x) = infW(x,r).
T

It is well known that the infimum is the same if it is taken over pure Markov
stopping times. Indeed, it is the optimal pure Markov stopping set that we
seek with the numerical methods. A further simplification, owing to the
positive lower bound on k, is that we need to consider only stopping times
whose mean value satisfies

E 2supz jg(x)l
z'T ::::; ko .

This follows because if the mean value is larger, we will do better to stop
at t = 0.
3.3 Control Until a Target Set Is Reached 61

Formal Derivation of the Bellman Equation. Let B denote the op-


timal stopping set; i.e., the process stops when the set B is reached or
entered for the first time. Then V(x) ~ g(x) and V(x) = g(x) only on B.
The equation satisfied by the optimal cost is

{ .CV(x) + k(x) = 0, x ¢ B (2.2)


V(x) = g(x), x E B,

where the set B is part of the solution. We will give a simple derivation.
The derivation assumes familiarity with the principle of optimality, which
is discussed in Chapter 2 for the optimal stopping problem. Suppose that
we restrict the times to be multiples of a small ~ > 0. Then at each
time n~, we have the choice of stopping or continuing. Given the current
state, the additional cost that we pay for immediate stopping is g(x(n~)).
Heuristically, the usual dynamic programming argument tells us that the
additional cost paid for continuing and using the optimal decisions in all
future steps is Ex(n~)[V(x((n + 1)~)) + ~k(x(n~))]. Thus, heuristically,

V(x) =min [g(x), ExV(x(~)) + ~k(x)].


Now subtract V(x) from both sides:

min [g(x)- V(x), ExV(x(~))- V(x) + ~k(x)] = 0.

Recalling that for x ¢ B we have V(x) < g(x), we see that for x ¢ B
the minimum must be taken on by the second term. If we divide by ~. we
obtain
1
~ [Ex V(x(~))- V(x)] + k(x) = 0.

If we apply Ito's formula, use the assumption that V(·) is smooth, and then
send ~ to zero, the result follows. Depending on whether the boundary is
reflecting or absorbing, the appropriate boundary conditions are added.

3.3 Control Until a Target Set Is Reached


The model will be the continuously controlled versions of (1.1) and (1.2):

dx = b(x, u)dt + u(x)dw, (3.1)

dx = b(x, u)dt + u(x)dw + dJ, (3.2)


where the jump term J is uncontrolled and satisfies the conditions in Sec-
tion 1.5. Let us start by considering just (3.1). We will assume that the
control takes values in a compact set U. Recall that the control is admissi-
ble if it is aFt-adapted, measurable and U-valued process. If the control
can be written as a function of the current state and time, then we say that
62 3. Dynamic Programming Equations

it is a pure Markov control. In this section we will consider pure Markov


controls that depend only on the state. Define the covariance matrix a(·)
and differential operator t,o:: a(x) = a(x)a'(x), and for f(-) in C 2(JRk),
1
co: f(x) = fx(x)'b(x, a)+ 2tr [fxx(x)a(x)J, {3.3)

where
tr [fxx(x)a(x)] = L fx;xi (x)aij(x).
ij

We take G to be a target set as in Section 3.1, and T = inf{t: x(t) E oG}.


For an admissible control u(·), the cost functional is

W(x,u) = E; [1-r k(x(s),u(s))ds + g(x(r))]. {3.4)

Define
V(x) = infW(x,u),
where the infimum is over the admissible controls.
We now apply a formal dynamic programming argument to derive the
PDE which is satisfied by the optimal value function V(·). Suppose that
V(·) is as smooth as necessary for the following calculations to be valid.
Suppose that there is an optimal control u(·) which is pure Markov. Let
fl. > 0, and let a be any value in U. Define u( ·) to be the control process that
uses the feedback control u( ·) for t ~ fl. and uses the control identically
equal to a for t < fl.. Define the process x( ·) to be the process which
corresponds to use of the control u(·). Let f denote the time that the
target set is reached under this composite control. Let x( ·) and T denote
the solution and escape time under the optimal control u(·). By definition,
we have
V(x) = E! [fo-r k(x(s),u(x(s)))ds + g(x(r))].
The optimality of V(·) implies

V(x) ~ E[; [fo-r k(x(s),u(s))ds+g(x(r))]


~ ~ [!."~ k(x(s),a)d-+ g(x(r))I{r<~)]
+ E[; [i-r k(x(s),u(x(s)))ds + g(x(r))] 1{-r~A}·
By the Markov property, the definition of u(·), and the optimality of u(·),
the inequality above may be rewritten as

V(x) ~ E[; [1-r/\A k(x(s), a)ds + g(x(r))I{-r<A} + V(x(!l.))I{-r~A} ]·


3.3 Control Until a Target Set Is Reached 63

l
Therefore

1 E;- [ V(x(~))- V(x)


~ + lo{t:. k(x(s),a)ds ;:::: ~ -
1 E;h(r,~,u)I{r<t:.},

(3.5)
where
h(r,~,u) = V(x(~)) -lt:. k(x(s),a)ds- g(x(i))
rl\t:.
is bounded uniformly in w and ~. If we assume (as in Subsection 3.1.1)
the condition P,:{r < ~}/~-+ 0 as~-+ 0, then the right hand side of
(3.5) tends to zero as ~ -+ 0. Therefore, taking this limit yields that, for
any value of a in U,
.cav(x) + k(x, a) ;:::: 0.
Suppose in the calculations above that a is replaced by u(x(s)) on [0, ~),
and that u(·) is continuous at x. Then the analogue of (3.5) holds with the
inequality replaced by an equality. We then formally obtain the equation

_cu(x) V(x) + k(x, u(x)) = 0.


It follows that

{ infaEU
[.C 0 V(x) + k(x, a)j = 0, X E G0
(3.6)
V(x) = g(x), X E aG.
It should also be noted that

E;V(x(t 1\ r))-+ E;g(x(r))

for admissible u.

A Verification Theorem. Suppose that there is a bounded function V(·)


which is in Cl(G0 ) and a feedback control u(x) such that (3.6) holds with
the infimum taken on at a= u(x), and that ExT< oo holds for admissible
controls for which the cost is bounded. [To simplify the discussion, we will
assume that b(·, ·),a(·), and the control u(·) are all Lipschitz. This allows all
solutions considered here to be defined on a common fixed probability space
(see Section 1.3). This condition will not be assumed in the convergence
proofs later in the book.] Then it can be shown that V(x) is indeed the
optimal cost and u( ·) an optimal control. We will give an outline of the
proof.
Let u( ·) be an admissible control with x( ·) the associated solution to
(3.1). By the minimization in (3.6),

0 = _cu(x)v(x) + k(x, u(x)),


o ~ c•Ctlv(x(t)) + k(x(t), u(t))
64 3. Dynamic Programming Equations

for all values of x E C 0 , t ~ 0, and w. Let r and f denote the escape time
under the controls u(·) and u(·), respectively. Then by Ito's formula we can
write

-e:V(x(t Ar)) + V(x) -E! Jo


tl\7' cu(x(s))V(x(s))ds

= E!
tl\7' k(x(s),u(x(s)))ds
lo

and

-E;v(x(t A f))+ V(x) -E: 1tM .cu<s>v(x(s))ds

< E! 1tM k(x(s), u(s))ds.

Suppose that E~V(x(tAf))--+ E~g(x(f)) and E;V(x(tAr))--+ E;g(x(r))


as t --+ oo. Then the equations just given imply

W(x, u) = E! [1T k(x(s), u(x(s))ds + g(x(r))] = V(x)


and

V(x) ~ E; [lf k(x(s),u(s))ds + g(x(f))l = W(x,u).

Hence, the minimizing control u( ·) is optimal.


There are numerous combinations of this framework of controlled drift
term with the cost functions and processes which are discussed in this chap-
ter. For example, an absorbing or target set can be added to the optimal
stopping problem, and one can have both a choice over the stopping time
and a continuously running cost. For example, consider a problem with the
cost (3.4), but where the stopping timer can be chosen to be any admis-
sible stopping time which is no greater than the first time that the set C 0
is exited. Then the Bellman equation is

{ infaEU [.CaV(x) + k(x,a)] = 0, x E C 0 , x fl. B,


V(x) = g(x), x E B,

with the absorbing boundary condition on 8C and with the set B part of
the solution.
3.5 Average Cost Per Unit Time 65

3.4 A Discounted Problem with a Target Set and


Reflection
Let us add a discount factor to the problem discussed in Section 3. The
model can be either (1.1) or (1.2). The cost is now

Then the Bellman equation for the minimum cost is

inf[.C 0 V(x)-.BV(x)+k(x,a)J=O, xEG0 , (4.2)


a

together with the absorbing boundary condition.

J:
As another alteration, let the "local discount rate" depend on the state.
In particular, define A(t) = exp- ,B(x(s))ds for some bounded, contin-
uous, and nonnegative function .8(·). Let the cost be (4.1), with exp,Bt
replaced by A(t). Then the Bellman equation for the optimal cost is just
(4.2), with the .B replaced by ,B(x).
Let the cost be (4.1), and let the process be the reflected form of ( 1.1),
with reflection set 8G and reflection direction r(x) as in Section 1.3. Then
the Bellman equation is

inf [.C 0 V(x)- ,BV(x) + k(x,a)J = 0, x E 00


aEU

with boundary condition


Vx(x)'r(x) = 0.

3.5 Average Cost Per Unit Time


For the average cost per unit time problem to be computable from a numer-
ical analysis point of view, it is usually necessary that the state space be
compact. Thus, we are essentially confined to some sort of reflection prob-
lem. There could be some more general boundary process and this will be
discussed in the chapter dealing with the convergence of the approximation
for the reflected case. For the purposes of this chapter, we simply state the
Bellman equation for an ergodic cost function and with a reflected diffu-
sion. Let the domain G and directions of reflection r(-) be as in Subsection
3.1.3. Suppose that there is a smooth function V(·) and a constant 'Y which
satisfy the equation

infaEU [.C 0 V(x)- "( + k(x,a)j = 0, X E G0 ,


Vx(x)'r(x) = 0, X E aa.
66 3. Dynamic Programming Equations

Let the infimum be taken on by a feedback function u(·) under which the
reflected diffusion is well defined. Then one can show that u( ·) is optimal
with respect to any admissible control u( ·) for which

E;V(x(t))jt--+ 0

as t --+ oo. The proof of a verification theorem for this problem combines
the ideas of Subsection 3.1.4 and Section 3.3.
4
The Markov Chain Approximation
Method: Introduction

The main purpose of the book is the development of numerical methods for
the solution of control or optimal control problems, or for the computation
of functionals of the stochastic processes of interest, of the type described
in Chapters 3, 7-9, and 12-15. It was shown in Chapter 3 that the cost or
optimal cost functionals can be the (at least formal) solutions to certain
nonlinear partial differential equations. It is tempting to try to solve for
or approximate the various cost functions and optimal controls by deal-
ing directly with the appropriate PDE's, and numerically approximating
their solutions. A basic impediment is that the PDE's often have only a
formal meaning, and standard methods of numerical analysis might not be
usable to prove convergence of the numerical methods. For many problems
of interest, one cannot even write down a partial differential equation. The
Bellman equation might be replaced by a system of ''variational inequali-
ties," or the proper form might not be known. Optimal stochastic control
problems occur in an enormous variety of forms. As time goes on, we learn
more about the analytical methods which can be used to describe and an-
alyze the various optimal cost functions, but even then it seems that many
important classes of problems are still not covered and new models appear
which need even further analysis. The optimal stochastic control or stochas-
tic modeling problem usually starts with a physical model, which guides
the formulation of the precise stochastic process model to be used in the
analysis. One would like numerical methods which are able to conveniently
exploit the intuition contained in the physical model.
The general methods developed in this book can be applied to a very
broad class of stochastic and deterministic control problems, as well as to
68 4. Markov Chain Approximation Method

problems of optimal filtering and optimal filtering combined with control,


or for the computation of a large class of functionals of diffusion or jump
diffusion processes. The methods are quite intuitive. They do not require
an understanding of the analytical properties of the equation of the model
or of the Bellman equation for the optimal cost function. They have been
used with success on most of the usual problems of stochastic control. Some
common forms of the methods do reduce to standard finite element or finite
difference methods. But then, owing to the degeneracies of the operators or
to nonstandard boundary conditions or controls, it seems that the standard
methods of proof of numerical analysis often cannot be used.
In this chapter, we first describe the basic idea, which is quite simple. The
procedure will be illustrated via a "canonical" problem, and other classes
of problems will be treated in subsequent chapters.
The basic idea is the following. We approximate the original (controlled
or not) problem with a simpler (controlled or not) stochastic process model
and associated cost function for which the desired computation can be car-
ried out. In particular, the approximating process is a (controlled or not)
Markov chain on a finite state space. This state space is a "discretization"
of the original state space of the problem. There are many methods for
the numerical solution of such Markov chain problems (see, for example,
Chapter 6). The approximating Markov chain is chosen such that certain
"local" properties of the approximating chain are "similar" to those of the
original controlled process. A cost function for the Markov chain model
which is an appropriate analogue to that for the original model is then
found. One chooses a Markov chain model for which the computation is
reasonable. It turns out that the procedure can be used almost automati-
cally, in that there are standard methods which can be used to construct
the chains and cost functions. Two classes of methods, discussed in the
next chapter, illustrate the possibilities.
The approximating chain is parameterized by a parameter (say, analo-
gous to a finite difference interval or to a "finite element size" in classical
numerical analysis), such that as the parameter goes to, say, zero, the "lo-
cal properties" of the chain resemble more and more closely those of the
original process. By local properties, we mean, essentially, the mean and
mean square change per step, under any control, as well as the mean reflec-
tion direction for the problem with a reflecting boundary. Under very broad
conditions, one can prove that the sequence of optimal cost functions for
the sequence of approximating chains converges to that for the underlying
original process as the approximation parameter goes to zero. The proofs
are purely probabilistic: We never need to appeal to regularity properties
of or even explicitly use the Bellman equation, whether it is formal or not.
In addition, one can take advantage of knowledge of or intuition concern-
ing the physical process. The optimal value function for the approximating
chain is an optimal value function for a controlled process and cost cri-
terion which are very close to the originally given ones. The convergence
4.1 Markov Chain Approximation 69

is analogous to the convergence of a sequence of finite difference or finite


element approximations to a PDE as the approximation interval goes to
zero.
In order to illustrate the idea in a simple form, we first show how to use it
on a controlled Wiener process in Section 4.4. In Section 4.5, we show how
to use it on a deterministic problem for which current alternative methods
seem to be less intuitive, and even more complex. It will be shown" that one
form of the approach is equivalent to a finite element approximation to the
Bellman equation for that deterministic problem, and that a convergence
proof can be readily obtained using simple probabilistic methods, even
though the original problem is not probabilistic.
In the previous chapter, we defined an admissible control for several
types of stochastic problems. It turns out that for convergence analysis,
neither that definition of admissible control nor its deterministic analogue
are always adequate, and one has to enlarge the class of allowed controls,
to the so-called "relaxed controls." The class is enlarged in such a way that
the infimum of the value functions over the enlarged class is the same as
that over the original class, so that the optimal value function does not
change when working with the larger class of controls. The enlarged class
of controls is used for analytical purposes only. The numerical procedures
will always give feedback controls. The discussion of the convergence of
the numerical method applied to the deterministic problem gives us the
opportunity to introduce the notion of deterministic relaxed control, and
to discuss its proper role. This will be developed further together with the
stochastic relaxed control in Chapter 9. There is some overlap between this
chapter and Chapter 5, but that overlap concerns fundamental points.
The general Markov chain approximation method is outlined in Section
4.1. The actual process which approximates the original controlled diffu-
sion or jump diffusion is a continuous time parameter interpolation of the
Markov chain. Two useful interpolations are discussed in Sections 4.2 and
4.3. The process discussed in Section 4.3 is a continuous time parameter
Markov chain and will be the one most used in the sequel. The ideas of
Sections 4.1 to 4.3 will be used heavily in the rest of the book. The ma-
terial in Sections 4.4 to 4.6 illustrates their application in special simple
problems.

4.1 The Markov Chain Approximation Method


In this section, we will describe and motivate the basic type of Markov
chain approximation that will be used and show that it is quite natural.
For illustrative purposes, consider the diffusion process model:

dx(t) = b(x(t), u(t))dt + a(x(t))dw. (1.1)


70 4. Markov Chain Approximation Method

Let G be a compact set which is the closure of its interior G0 . For {3 > 0,
we consider the discounted cost

W(x, u) = E: for e-{jtk(x(t), u(t))dt + E:e-fh g(x(r)), (1.2)

where r = inf{t: x(t) (j. G0 }, the first escape time of x(·) from G0 . Define

V(x) = infW(x,u),
1.1

where the infimum is over all admissible controls. Recall (Section 1.3) that
an admissible control u( ·) is a measurable process which is nonanticipative
with respect tow(·), and u(t) takes values in U, a compact set. The devel-
opment will be entirely formal, because we are concerned with motivation
only. But in order to be certain that we are on solid ground, let us sup-
pose here that the diffusion is well defined for any admissible control. In
particular, suppose that:

Al.l. b(·) and o{) are bounded, continuous, and Lipschitz continuous in
x, uniformly in u. Both k(·) and g(·) are bounded and continuous.

The conditions will be weakened in Chapters 9-15.

Approximating the Process: General Remarks. The methods to be


employed are not based on the analytical expressions (e.g., the formal
PDE's of Chapter 3) for the functions whose values we wish to compute.
Rather, they are based on approximating the basic controlled process ( 1.1)
by a simpler controlled process, for which the evaluation of either the cost
function for a fixed control or of the optimal cost can be done with an
acceptable amount of computational work. If the approximating controlled
process is close to the original process x(·) in an appropriate statistical
sense and the form of the associated cost function is close to (1.2), then
we would expect that the value of the cost function for the approximating
process for a fixed control (or its optimal value over all controls) will be
close to the cost function W(x, u) for a similar control [or to its optimal
value V(x), respectively]. The approximating processes will be piecewise
constant. Let h > 0 be a scalar approximation parameter. The basis of the
approximation is a discrete time parameter finite state controlled Markov
chain {~~, n < oo} whose "local properties" are "consistent" with those
of {1.1). The continuous time parameter approximating process will be a
piecewise constant interpolation of this chain, with appropriately chosen
interpolation intervals. The next chapter is devoted to convenient ways of
getting the approximating chains.

A Markov Chain Approximation: Terminology. For each h > 0, let


{ ~~, n < oo} be a controlled discrete parameter Markov chain on a discrete
4.1 Markov Chain Approximation 71

state space Sh E JRr with transition probabilities denoted by ph(x, yja).


The a is the control parameter and takes values in the compact set U. We
use u~ to denote the random variable which is the actual control action for
the chain at discrete time n. We now define some terms which will allow us
to relate the chain to the diffusion {1.1). In Chapter 5, we will show how
all of these quantities can be readily calculated. Suppose that we have an
"interpolation interval" ~th(x,a) > 0, and define ~t~ = ~th(e~,u~). Let
SUPx,a ~th(x, a) -+ 0 as h -+ 0, but infx,a ~th(x, a) > 0 for each h > 0.
This positivity will be relaxed when considering systems with singular con-
trols or instantaneously reflecting boundaries. Let C~ denote the compo-
nents of the state space which are interior to the set C : C~ = Sh n C 0 .
Thus, C~ is the finite state space of the chain until it escapes from C 0 .
Define the difference ~e~ = e~+l - e~. Let E~;;: denote the conditional
expectation given {ef,u?,i:::; n,e~ = x,u~ =a}. Suppose that the chain
obeys the following "local consistency" conditions, which also define the
functions bh(·) and ah(·):
E~·;:~~~ = bh(x, a)~th(x, a) = b(x, a)~th(x, a)+ o(~th(x, a)),
'
E~;;:[~e~- E~;;:~e~][L~e~- E~;;:~e~J' = ah(x)~th(x,a)
= a(x)~th(x, a)+ o(Llth(x, a)), {1.3)

a(x) = a(x)a'(x),
SUPn,w !e~+l- e~j ~ 0.
Note that the chain has the "local properties" of the diffusion process (1.1)
in the following sense. By Section 1.3, letting x(O) = x and u(t) =a on the
interval [0,8] in (1.1) gives us

Ex(x(8)- x) = b(x, a)8 + o(8),


(1.4)
Ex[x(8)- x][x(8)- x]' = a(x)8 + o(8).
The local consistency (1.3) is essentially all that is required of the approxi-
mating chain, except for the analogous considerations which will be needed
when dealing with jump diffusions, reflecting boundaries, or singular con-
trols.
Following the terminology in Chapter 2, we say that a control policy
uh = { u~, n < oo} for the chain is admissible if the chain has the Markov
property under that policy. In particular, the policy is admissible if

(1.5)

Let E'// denote the expectation, given that e~ = X and that either an ad-
missible control sequence uh = {u~, n < oo} or a feedback control denoted
by uh (·) is used, according to the case.
72 4. Markov Chain Approximation Method

4.2 Continuous Time Interpolation and


Approximating Cost Function
The chain {~~, n < oo} is a discrete time parameter process. In order to
approximate the continuous time parameter process x( ·), we will need to
use an appropriate continuous time interpolation. Owing to the properties
of the "interpolation interval" ~th(x, a), a natural interpolating time scale
might be obvious. There are two interpolations which are useful, the first
[to be denoted by ~h(·)] uses interpolation intervals ~t~ = ~t(~~,u~).
The second interpolation [to be denoted by ,ph(·)] is actually a continuous
parameter Markov process. The first was used in much of the convergence
analysis in past works [94, 90]. The second was initially introduced in [101].
It allows some simplifications in the proofs and will be used in most of the
convergence analysis of this book. In this section, we will define ~h(·) and
discuss appropriate analogues [(2.2) and (2.3) below] of the cost function
(1.2) for the chain, and then write the dynamic programming equations for
the resulting control problem. In the next section, the second interpolation
,ph (·) is defined and its properties discussed.
Let {u~, n < oo} be an admissible control for the chain and define the
interpolated time t~ = 2::~-l ~t?. Define the continuous parameter inter-
polations ~h(·) and uh(·) by:

(2.1)

See Figure 4.1.


h
e~

eg ~~

ef

t~ t~ t~
t
~ta ~t}
Figure 4.1. Construction of the interpolation ~h(·).

The interpolated process defined by (2.1) is an approximation to the


diffusion (1.1) in the sense that the "local properties" (1.3) hold. The in-
terpolation intervals ~th(x, a) can always be chosen to be constant if we
4.2 Continuous Time Interpolatio,p 73

wish (see Section 5.2), but we might find that restrictive. For example, if
the local velocity b( ·) is large at some value of x, then we might want to
use a smaller interpolation interval there. Also, the numerical procedures
converge faster when we take advantage of the added flexibility allowed by
the variable intervals. The interpolated process eh (·) is piecewise constant.
Given the value of the current state and control action, the current interval
is known. The interpolation intervals are obtained automatically when the
transition functions ph(x, yla) are constructed. See Chapter 5 and Sections
4.4 and 4.5 below.

Let N h denote the first time that {e~, n < oo} leaves G~. Then, the
first exit time of eh (.) from G0 is Th = tt .
There are several natural cost
functionals for the chain which approximate (1.2), depending on how time
is discounted on the intervals of constancy [t~, t~+ 1 ). If the discounting is
constant on this interval then we can use the approximation
Nh-1
Wf(x, uh) = E;h L e-11t~k(e~, u~)~t~ + E;h e-i1rhg(e'JvJ. (2.2)
n=O

If future time is to be "continuously discounted," then we can use the


approximation

We have

which goes to zero as h --t 0 because supx 0 ~th(x, a) ~ 0. Let us define


V:h(x) = infu Wih(x, u), where the infimum' is over all admissible controls.
The cost functions (2.2) and (2.3) both approximate (1.2). We clearly have

IVhx)- V2h(x)l ~ o.
The dynamic programming equation (2.3.2) for cost function (2.2) is

~Jl} [L e-!1Llth(x,a)ph(x, yla)Vhy)


y

+ k(x, a)~th(x, a)], X E G~,


(2.4)

g(x),
74 4. Markov Chain Approximation Method

The dynamic programming equation for V2h(x) is the same, except that the
coefficient Llth(x, a) of k(x, a) is replaced by

rt;.th(x,o.)
Jo e-{3sds = [1 - e-.Bt!.th(x,o.) J / fJ = Llth(x, a) + 0( (Llth(x, a) )2 ).
(2.5)
A third possibility for the approximation of the discount factor appears in
(3. 7), and is based on the continuous parameter Markov chain interpolation.
The difference between the solutions of (2.4) and (3.7) goes to zero ash--+ 0.

Discussion. The similarity of the cost functions (2.2) and (2.3) to (1.2) and
the similarity of the local properties of the interpolation ~h ( ·) to those of
the original controlled diffusion x(·) suggest that the ~h(x) might be good
approximations to V(x) for small values of h. This turns out to be true.
Any sequence ~h(·) has a subsequence which converges in an appropriate
sense to a controlled diffusion of the type (1.1). This will be dealt with in
Chapters 9 and 10. Suppose that uh(x) is the optimal control for the chain
{~~' n < oo} with cost function (say) (2.2), and suppose that the associated
sequence ~h(·) converges to a limit diffusion x(·) with admissible control
u( ·). Under quite broad conditions, the sequence Th of times that the chains
first exit cg will also converge to the time that the limit process x( ·) first
exits G0 • If this is the case, then the cost functionals ~h(x) for the sequence
of chains will converge to the cost functional W(x, u) for the limit process.
Because V(x) is the optimal value function, we have that W(x, u) ~ V(x)
and, hence, liminfh \Jih(x) 2:': V(x). The reverse inequality will be proved
by another approximation procedure, which uses the optimality of the cost
functionals ~h(x) for the controlled chain. For the mathematical proof of
the convergence, we might need to extend the class of allowed controls to
a class of so-called "relaxed controls," but the infimum of the cost function
over the original class of controls and that over the new class of controls
are equal. A good example of the entire procedure is given in Sections 4.5
and 4.6 below for a simple deterministic problem.
The Markov chain approximation method is thus quite straightforward:
(a) get a locally consistent chain; (b) get a suitable approximation to the
original cost function for the chain.

4.3 A Continuous Time Markov Chain


Interpolation
By the construction of the process t;,h (·) in the last section, its "holding
times" or interpolation intervals are Llth(t;,~, u~). Once the control and
4.3 A Markov Chain Interpolation 75

state at the start of the n-th interval are known, the length of the interval
is known. It is sometimes more convenient for the proofs of convergence
to use a continuous parameter interpolation of {e~' n < 00} which is a
Markov process itself. We now construct such an interpolation, which will
be denoted by 1/Jh ( ·). Define T~ = 0, let {T~, n < oo} denote the moments
of change of 1/Jh (·), and set Ll. r~ = r::+l - r::. Define 1/Jh (·) at the r:: by

1/Jh(r~) = e~, n < 00. {3.1)


Alternatively written,

1/Jh(t) = I: Ll.e? + eg. (3.1')


i:Tf+l~t

We need only define the distribution of Ll.r::, conditioned on (e~ = x, u~ =


a). This will be an exponential distribution with mean value fl.th(x,a);
i.e.,

P{ Ll.r~ ~ tlef, uf, rih, i ~ n; e~ = x, u~ =a} = 1 - exp[-t/ Ll.th(x, a)].


Consequently,
Eh,a Ll.rh = Ll.th(x a) (3.2)
x,n n ' '
which is just the interpolation interval for the eh(·).
Using the same notation as in the last section, define uh(·) by uh(t) = u~
for t E [r::, r::+l). Let P:,.;: denote the conditional probability defined by
the conditional expectation E~;i:. Let E::~ (with associated conditional
probability P:;t) denote the expectation given the data

{1/Jh(s), uh(s), s ~ t; T~ : r~ ~ t; 1/Jh(t) = x, uh(t) =a}.

Local Properties of 1/Jh (·) and a Convention Concerning "Zero"


Jumps. For the controlled Markov chain {e~, n < oo}, it is possible that
the transition probability ph(x, xla) be positive for some values of x and
a. In this case, the actual sample value of 1/Jh(-) might not change with
probability one at each time r::. For notational consistency, we allow "zero"
jumps for 1/Jh(·). That is, the jump times for 1/Jh(·) are defined to be the
times {r::} whether or not 1/Jh (·) actually changes value at all of those times.
By definition, for each h,

P;/{jump on [t, t +d)}= fl.thtx,a) + o(d).

For d > 0, define the increment fl.¢h(t) = 1/Jh(t +d) - 1/Jh(t). The local
properties of 1/Jh(·) follow from (1.3) and the above definitions and are

P:,'t{'I/Jh(t+d) = y, jump on [t, t+d)} = fl.thtx, a)ph(x, yla)+o(8), (3.3)


76 4. Markov Chain Approximation Method

P::t{ jump on [t, t + 8)} I>h(x, yJa)(y- x)


y
P:,•t' {jump on [t, t + 8)}bh(x, a)~th(x, a)
8 h h
h( )b(x,a)~t (x,a)+8o(~t (x,a))+o(8),
~
t x,a
(3.4)
E;n~'!fh(t)][~'!fh(t)]' = a(x)8 + 8o(~th(x,a)) + o(8). (3.5)
These will be extended to cover the jump diffusion process in Chapter 5. We
define an admissible control for the process 'lfh (·) as it was defined for the
~~, n < oo. It is any U -valued process which is constant on the intervals
[r~,T~+ 1 ) and for which the imbedded chain{~~, n < oo} has the Markov
property (1.5). Thus, there is a complete equivalence between the control
models with the discrete and continuous parameter chains.
Abusing terminology, let us reuse the symbols Th for the escape times
from G0 as in the last section, but here it is for the process '!fh(·). A natural
analogue of the cost function (1.2) is

= E;" 1 7
" e-f3tk('¢h(t),uh(t))dt + E;" e-f3r"g('tf}(rh)), (3.6)

where the symbol uh denotes the use of admissible uh (·).

The Dynamic Programming Equation for Cost (3.6) and Process


1/Jh( •). The dynamic programming equation is the same as (2.5) except for
a slight difference in the discount factor. Note that
D.r" h( )
Eh,a. { n e-f3sds = tlt x,a
x,n } 0 1 + {3~th(x, a)
and
Eh,a.e-f3t!.r~ = 1 .
x,n 1 + {3~th(x,a)
Thus the integral on the right hand side of (3.6) can be written as
N,.-1 A h
u"
Ex
~
~en
-f3r"k( h h) U.Tn
~n,Un1 f3"h'
n=O + U.Tn
Consequently, the effective average discount factor from time T~ to time
T~+l, given that ~~= x and u~ = a, is
1
) = exp[-{3~t (x, a)](1 + O(~t (x, a))).
h h
{3~ h(
1+ t x,a
4.3 A Markov Chain Interpolation 77

Define Vh(x) = infu Wh(x, u), where the infimum is over all admissible
control sequences. Then the dynamic programming equation for the con-
trolled chain {e~, n < oo} and cost (3.6) is

l
Vh(x) =

. [ 1 "" h h t:J.th(x, a)
~ll} 1+,Bt:J.th(x,a)7P (x,yia)V (y)+k(x,a)1+,8t:J.th(x,a) '
(3.7)
for X E ag and with the boundary condition Vh(x) = g(x), X ri ag. We
have !Vh(x)- Vhx)l ~ 0, and any of the above dynamic programming
equations can be used with the same asymptotic results. We will return to
this equation in connection with the example in Section 4.5.

An Alternative Derivation of (3.7). It is evident from the represen-


tations (3.6) that the cost functions and the optimal cost function for the
{e~' n < 00} and 1/Jh (.) processes are the same, modulo slight differences
in the discount factor. But, in order to complete the demonstration of
the equivalence of the discrete and continuous parameter problem, let us
formally derive the dynamic programming equation for the optimal value
function Vh (x) for the continuous parameter Markov chain model'I/Jh (·) and
cost function (3.6) directly. The formality of the development is mainly in
that we ignore the possibility of multiple events on small intervals [0, 8),
but it can be shown that the contribution of these possibilities goes to zero
as 8 --+ 0. By the principle of optimality, for x E Gg and small 8 > 0, we
can write

Collecting the coefficients of Vh(x), dividing all terms by o, multiplying all


terms by t:J.th(x, a)/(1 + ,Bt:J.th(x, a)), and letting 8--+ 0 yields (3.7). Thus,
the minimal cost for the continuous parameter Markov chain interpolation
is just that for the discrete parameter Markov chain model with a particular
form of the discount factor. Let us note that if there is no discounting
(,8 = 0), then the dynamic programming equation is the same for the
discrete and the continuous time problems.

A Useful Representation of ..ph(·). We next give an important repre-


sentation for 1/Jh(·) which will be useful in the proofs of Chapter 10, and
which also gives us a better intuitive feeling for the relationship between the
78 4. Markov Chain Approximation Method

processes 1/Jh(·) and x(·). Let us define the following "limit of conditional
expectations." By (3.4) and the definition of bh(·) in (1.3),

lim Eh,o. ~1/Jh(t)


6--tOt,x 8
(3.8)
b( ) o(~th(x,a))
X, a + u.t h( x, a )
A

Now, factoring out this conditional mean rate of change of 1/Jh(-) or "com-
pensator," as it is commonly called, and letting x(O) = 1/Jh(O) = x, we can
write [the expression defines Mh(·)]

The jumps of Mh(-) are those of 1/Jh(-) and, hence go to zero as h ---+
0. Between the jumps, the process is linear in t. The process Mh (·) is a
martingale whose quadratic variation is J~ ah('I/Jh(s),uh(s))ds, where ah(·)
is defined by (1.3) and also equals

lim Eh,o. [~1/Jh(t)- E;;:~1/Jh(t)][~1/Jh(t)- E;;:~'I/Jh(t)]'


6--tOx,t 8
lim Eh,o. [Mh(t + 8)- Mh(t)][Mh(t + 8)- Mh(t)]'
6--tO x,t 8
( ) o(6th(x, a))
= ax + ~th(x,a)
(3.10)
The resemblance of (3.9) to the diffusion (1.1) is more than accidental.
It will be shown in Chapter 10 that, under quite reasonable conditions,
any sequence of 1/Jh(-) processes has a subsequence which converges in a
particular sense and the limit satisfies (1.1), with an appropriate admissible
control, and where the stochastic integral is the limit of the Mh(·).

4.4 A Random Walk Approximation to the Wiener


Process
We now give a simple application of the concepts of the previous section
in order to fix the ideas. Perhaps the most classical use of Markov chain
approximations to diffusion type processes (1.1) is the use of the symmetric
random walk to approximate the Wiener process. First consider the case of
no control. Let x(t) = x+w(t), where w(·) is a real valued Wiener process,
and, for h > 0, define the set of points on the line sh = {0, ±h, ±2h, ... }.
Let {e~, n < oo} be a symmetric random walk on the set Sh, and define
4.4 A Random Walk Approximation 79

the interpolation intervals ~th(x) = ~th = h 2 . Define the continuous time


interpolation eh(·) as in Section 4.2. Then eh(-) can be represented in the
form
t/h 2 -1
eh<t) = x + I: hpi,
i=O
where the Pi are mutually independent and take values ±1, each with proba-
bility 1/2. Then it is well known [13, 52, 93] that a broad class of functionals
of x(·) can be approximated by the same functionals of eh(·) and that eho
converges to x( ·) in a weak or distributional sense. A similar result holds
for the continuous parameter Markov chain interpolation 1/Jh(·).
Now let us add a control, and write the system as

x(t) = x + w(t) +lot u(s)ds, (4.1)

where we require that lu(t)l ~ 1. Let G be the interval [0, B], where B > 0
is supposed to be an integer multiple of h. Let h ~ 1, and define the
transition probabilities for the approximating Markov chain as:
h 1±ha
p (x, x ± hia) = - 2- .

It is easily verified that the chain is locally consistent with x( ·) in the sense
of (1.3). The interpolation interval is just h2 • The dynamic programming
equation for the cost function Wf(x, u) of (2.2) is given by (2.4); namely,

V1h(x) = min e-f3h 2 [ 1 +2haVhx +h)+ 1 - 2haVhx- h)+ k(x, a)h 2 ]


lad9
(4.2)
for x E (O,B), with V1h(x) = g(x) otherwise.
The dynamic programming equation for the cost (1.2) and system (4.1)
with G = [0, B] is given by (3.4.2), which in the present case takes the
form:

0 ~JD [Vx;(x) + Vx(x)a + k(x,a)- ,BV(x)], x E (O,B),


V(x) g(x), x = 0, B.
(4.3)
The expression (4.2) can be related to a finite difference approximation to
(4.3) as follows: Let us take a finite difference approximation to (4.3) and
use the following approximations, where h is the difference interval:

v; ( ) V(x +h)+ V(x- h) - 2V(x)


:z::z: X -t h2 '

v; ( ) V(x +h)- V(x- h)


:z: X -t 2h .
80 4. Markov Chain Approximation Method

Then, using Vh(x) to denote the finite difference approximation to (4.3),


for x E (0, B) we have

Vh(x)
. 1 [1 + ah h 1- ah h ]
= ~Jl} 1 + ,8h 2 - 2- V (x +h)+ - 2- V (x- h)+ k(x, a)h 2 .

(4.4)
Note that ( 4.4) is the dynamic programming equation for a discounted cost
problem for a controlled random walk with ~th(x, a) = h and 2 discount
factor 1/(1 + ,B~th(x,a)). It is just (3.7) and is an O(h 4 ) approximation
to (4.2). The consistency of the equation (4.4), obtained by using the finite
difference method with the dynamic programming equation obtained for
the process '1/Jh(-) in the last section or with (4.2), suggests that the control
problem for the Markov chain might actually be a good approximation to
that for the original problem. Appropriate finite difference approximations
can often be used to obtain approximating Markov chains, as will be seen in
Chapter 5. But we emphasize that from the point of view of the convergence
proofs it is the controlled process which is approximated and not the formal
PDE (4.3).

4.5 A Deterministic Discounted Problem


Many of the concepts which are involved in the use of the Markov chain
approximation method and the convergence proofs can be illustrated by a
discounted cost and a purely deterministic problem. In fact, the example
that follows illustrates the practical use of the approximation and numerical
methods for purely deterministic systems when feedback controls or optimal
value functions are wanted. We will work with the following system:

x(t) = b(x(t), u(t)), (5.1a)

W(x,u) = 1 00
e-f3tk(x(t),u(t))dt, (3 > 0. (5.lb)

The functions b( ·) and k( ·) are assumed to be bounded and continuous.


The admissible controls are just the U -valued measurable functions on
the interval [0, oo), where U is a compact set. In order to assure that the
solution is well defined and to avoid needless complications in the rest of this
chapter, we suppose that b(·, a) is Lipschitz continuous, uniformly in a. Let
there be continuous functions ~th(x, a) which satisfy k 2h ;::: ~th(x, a) ;:::
k 1 h for some constants ki > 0. Because the example is used for illustrative
purposes only and we wish to keep it simple, we do not include a stopping
boundary or target set. Thus, the control is over the infinite time interval
and infinite space. In this sense, we are not necessarily working with a
4.5 A Deterministic Discounted Problem 81

computationally feasible algorithm. But the simplifications enable the main


points to be made without introducing excessive notation or concepts.

Remark on Stopping Boundaries. If a stopping boundary or target set


is introduced, then we need to add a condition which, loosely speaking,
reads as follows: For each f > 0, there is an €-optimal control such that
the corresponding trajectory of (5.1a) is not ''tangent" to the boundary at
the first contact time. This will be dealt with in detail in the context of
the stochastic problem in Sections 10.4 and 10.5, and for the deterministic
problem in Section 13.2.

An Approximation to (5.1a). We start by following a time honored


procedure and use a discrete time approximation to (5.1a), but where the
time intervals between successive updates might depend on the current
state and control. It will be seen that a natural approach to the com-
putational problem can be interpreted as a computational problem for a
controlled "approximating" Markov chain. Let h denote an approximation
parameter. Use the discrete time approximation to (5.1a) given by
-h -h -hh h-hh
en+l =en +b(en,un)at (en,un),

where u~ is the actual control which is used at the n-th update. Define
the sequence uh = {u~, n < oo} and the interpolation interval ~t~ =
ath( e~, u~). Define t~ = E~:o1 a if. Define the continuous/ parameter pro-
cess eh(-) by eh(t) = e~, t E [t~, t~+ 1 ). It is the eh(-) which approximates
the solution of (5.1a). A reasonable approximation to the cost function
(5.1b) is
L e-.Bt~k(e~, u~)~t~.
00

Wh(x, uh) =
n=O
The dynamic programming equation for the optimal cost is

[
ifh(x) = ~Jl) e-.8~th(x,a)ifh(x + b(x, a)ath(x, a)) + k(x, a)ath(x, a)] .
(5.2)

Interpreting (5.2) in Terms of an Approximating Markov Chain.


We will approximate (5.2) by a type of finite element method. The problem
is illustrated in Figure 4.2. In the figure, the sides of the triangles are O(h).
We take an approximation to ifh(·) of the following form: Approximate
the optimal cost function by a continuous function which is linear in each
triangle of the figure. We will show that this leads directly to a Markov
chain interpretation of (5.2), and that a simple probabilistic method can
be used to prove the convergence of ifh(x) to V(x), as well as to get an
approximate solution to (5.2). Let z(x, a) = x + b(x, a)~th(x, a) denote
the point which is reachable from x under control a in direction b(x,a) in
82 4. Markov Chain Approximation Method

time ~th(x, a). Refer to the figure, where the values are plotted for two
values of a. Let Yh(x, a) denote the corners of the triangle in which z(x, a)
falls. For example, in the figure Yh(x,a2) = {x,Yl>Y2}· We can represent
the point z(x,a) as a convex combination of the points in yh(x,a). Let
ph(x, yla) denote the weights used for the convexification. These weights
are nonnegative and sum to unity. Hence, they can be considered to be
transition probabilities for a controlled Markov chain whose state space is
just the set of all corner points in the figure.

z(x, a) = x + b(x, a)~t(x, a)


~--------------------------------~4

Figure 4.2. A piecewise linear approximation.

Now let Vh(·) denote the continuous piecewise linear approximation to


Vh(·) and rewrite (5.2) in terms of the transition probabilities as follows:

Vh(x) =min [e-{3t>.th(x,a) L ph(x, yla)Vh(y) + k(x, a)~th(x, a)].


aEU
yEYh(x,a)
(5.3)
Comparing (5.3) to (2.4), it is seen that the finite element approxima-
tion yields a solution which is just an optimal cost function for a con-
trolled Markov chain. We will next show, via a straightforward probabilistic
method, that Vh(x) converges to V(x) ash---* 0.
Let uh(x) be a minimizing value of a in (5.3), {~~' n < oo} the approx-
imating controlled chain, and let {u~, n< oo} denote the actual random
variables which are the optimal control actions. Define ~t~ = ~th(~~' u~)
and t~ = L~,:01 ~t~. Then,

E[~~+l - ~~~~~ = x, u~ =a] = b(x, a)~th(x, a) = O(h), (5.4a)

cov[~~+ 1 - ~~ 1e~ = x, u~ =a] = O(h2 ). (5.4b)


4.5 A Deterministic Discounted Problem 83

Convergence of the Approximations (h(·) and Vh(x) : Part (a).


Let E~ denote the expectation, given the state and control actions up to
and including time n. In order to see the resemblance of the stochastic
problem to the original deterministic problem, rewrite Eh(·) in the more
convenient form
Eh(t) =X+ L AE~-
n:t~+19

Write A-E~ = E~+ 1 - E~ in the form A-E~ = E~ A-E~ + (A-E~ - E~ A-E~). Then

We will next show that only the "mean increments" (the first sum on the
right) of EhO are important and that there is some control such that Eh(·)
is a good approximation to a solution to (5.1a) under that control.
The right hand sum in (5.5) is a continuous time interpolation of a
martingale. By (5.4b), its variance is E'Lm:th <tAt~O(h) = O(h)t. By
n+l-
(1.1.3), this implies that for any t < oo
2

Thus the effects of that right hand term in (5.5) disappear in the limit. The
basic reason for this is that the spatial and the temporal scales are both of
the order of h.
We write the right hand sum in (5.5) simply as O(h), and this is the
order of that term for approximations of deterministic problems in general.
Define the continuous parameter interpolation uh(·) by uh(t) = u~ on the
interval [t~+l, t~). Now, using (5.4a), we have

Eh(t) = X+ L E~A.E~ + O(h)

1t
n:t~+ 1 9
(5.6)
X+ b(Eh(s), uh(s))ds + O(h).

We now proceed to show that there is some admissible control such that
the paths of eho are actually good approximations to a solution of (5.la)
under that control. Because A.E~ = O(h) and At~ ~ k 1 h, the (piecewise
linear interpolations of the) paths of the process ~h(·) are equicontinuous
(in w and h). Thus, for each fixed value of the probability space variable
w, each subsequence of {Eh(·)} has a further subsequence which converges
to some limit uniformly on each bounded time interval. Suppose that the
same were true of the sequence of interpolated control paths uh(·). We
84 4. Markov Chain Approximation Method

note now, for future reference, that this is hard to guarantee. ThiS will be
the reason for the introduction of an expanded class of admissible controls
in Section 4.6 below. However, until further notice, we do proceed under
the assumption that the paths of the control processes are equicontinuous
in hand w. Even if this assumption is not true, the convergence result
for Vh(x) will remain true. The details for the general case will be given
in Section 4.6. Continuing, fix the sample space variable w and let hn(w)
index a convergent subsequence of (the piecewise linear interpolations of)
{eh(·),uh(·)}, with limit denoted by x(·,w),u(·,w). By the convergence
and the compactness of U, we have u(t,w) E U. The uniform convergence
implies that
x(t,w) = x+ 1t b(x(s,w),u(s,w))ds. (5.7)

Thus, the limit path satisfies the original ODE (5.la) with an admissible
control. Also, it is easily seen that Vhn(w)(x) converges to W(x, u(w)). Due
to the minimality of V(x), we have

W(x, u(w)) ~ V(x). (5.8)

Because this holds for each w,

limhinfVh(x) ~ V(x). (5.9)

Convergence of Vh(x) to V(x): Part (b). In order to get the desired


convergence result, we need to get the reverse inequality to (5.9). In order
to do this, we will jump ahead a little and assume a result which will be
discussed in Section 4.6: Namely, that there exists an optimal admissible
control for the problem (5.1) and that the control can be arbitrarily well
approximated by a control which is piecewise constant. That is, given any
€ > 0, there is an €-optimal control uE(·) of the following form: There is
5 > 0 and a finite number of points UE in U such that uE(·) is UE-valued
and is constant on the intervals (i5, i8 + 8). This fact remains true when
working with the expanded class of controls which will be introduced later.
We will next apply this €-optimal control to the Markov chain and use the
minimality of Vh(x) for the controlled Markov chain problem to get the
reverse inequality to (5.9).
The procedure is as follows. Fix € > 0. Define a sequence uh,E(·) of
controls for the chain by adapting the above €-optimal control uE(·) in
the following natural way: Let {e~·E' n < 00} denote the chain associated
with this new (to be defined) control. Let h be small enough such that 8 >
supx,a fith(x ' a) · Define the sequences uh,E
n '
fithn = fith(c:h,E
':,n '
uh,E)
n '
and thn =
I:~-l tl.tf, recursively by u~·E = uE(i8) for all n such that t~ E [i5, i5 + 5),
for each i. The constructed control is an admissible control for the Markov
chain. Let eh,E(·) and uh,E(·) denote the continuous parameter interpolation
[interpolation intervals tit~]. Note that uh,E(-) converges to uE(·) ash--+ 0,
4.6 Deterministic Relaxed Controls 85

except possibly at the points itS. Fix w and choose a convergent subsequence
of eh,E (.) (the sequence need not be the same for each w). Let xE (.' w) denote
the limit. Then, following the analysis which led to (5.7), we get that

The limit paths xE(·,w) are all the same, irrespective of the chosen subse-
quence or of w, because the solution to (5.1a) is unique under the chosen
control uE(·). This implies that the sequence Wh(x,uh,E) converges to the
~:-optimal cost W(x, uE). Now, using the optimality of Vh(x), we have

lim:up Vh(x) ~ lif[l Wh(x,uh,E) = W(x,uE) ~ V(x) +e. (5.10)

Inequalities (5.9) and (5.10) imply that Vh(x) -t V(x).

Remark on the Analogous Argument for a Stochastic Problem.


For the general stochastic problem of Section 4.1, we will not be able to
duplicate the above proof by choosing a convergent subsequence for each
w. Even if there is equicontinuity, there will be no guarantee that the limit
is actually a sample value of a random variable. But somewhat more gen-
eral compactness methods can be used there. These will be developed in
Chapter 9 and used in Chapter 10.
We now turn to the general case for the deterministic problem where
the control sequences uh(·) are not "nice enough" for there to exist the
convergent subsequences, and show how this case can be handled. This
requires a diversion, but the concepts will also be of use for the general
stochastic problem.

4.6 Deterministic Relaxed Controls


There need not exist an optimal control for problems such as (5.1) in the
sense that there is aU-valued measurable function ii{) such that

infW(x,u)
u
= V(x) = W(x,u),
where the infimum is over all the U-valued measurable functions. Our
primary concern is with getting a good approximation to V (·) and with
feedback controls which yield costs which are close to the infimum. The
numerical methods will give feedback controls, but in order to be able to
prove that the values of the costs which are given by the numerical algo-
rithms converge to the infimum of the costs as the approximation parameter
h converges to zero, we need to know that there is an optimal control in
some appropriate sense. In particular, as seen in the argument at the end
86 4. Markov Chain Approximation Method

of the last section, we will need to know that there is a reasonable class
of admissible controls, and an optimal control in that class which can be
approximated by a ''nice" piecewise constant control, with arbitrarily small
penalty. The class of relaxed controls to be defined below was introduced
for just such a purpose [10, 152]. In particular,

inf W(x, u) = inf W(x, u).


relaxed controls ordinary controls
The class of relaxed controls will be of importance in the proofs only. They
do not enter into the numerical algorithms or into actual applications.

Example of Nonexistence of an Optimal Control. For motivation,


consider the following artificial example: U = [-1, 1], /3 > 0,

x(t) = b(x(t), u(t)) = u(t),

(6.1)

Note that V(O) = 0. To see this, define the sequence of controls un(-) by

un(t) = (-1)k on [k/n, (k + 1)/n), k = 0, 1, ....


It is not hard to see that W(O, un) --+ 0 as n --+ oo. In a sense, when
x(O) = 0, the "optimal control wants to take values ±1 simultaneously."
But there is no optimal control in the usual sense.
The example is admittedly artificial, but more realistic examples arise
when the control value space is not "rich enough"; more particularly, when
the set of values (b(x,U),k(x,U)) is not convex for each x. As will be seen,
by definition the relaxed control is a measure on the control value space,
and allows all values of U to be used "simultaneously" with appropriate
weights. In preparation for the definition, we next show how to represent
an ordinary control as a measure.

The Representation of a Control as a Measure. Let u( ·) be an ad-


missible control and let B(U) and B(U x [0, oo)) denote the a-algebras over
the Borel sets in U and U x [0, oo), respectively. Define the measures mt (·)
on B(U) and m(·) on B(U x [0, oo)) by

mt(A) IA(u(t)),
(6.2)
m(A x [0, t]) J~ m 8 (A)ds.
We can now write (5.1) as

x(t) = L b(x(t), a)mt(da).


4.6 Deterministic Relaxed Controls 87

or as

x(t) = x + 1t Lb(x(s), a)m 8 (da)ds = x + 1t L b(x(s), a)m(dads),


(6.3)
with
W(x,m) = 1L 00
e-Psk(x(s),a)m(dads).

m(A x [0, t]) is just the total integrated time over the interval [0, t] that
the control u( ·) takes values in the set A C U. A relaxed control is just a
generalization of such m( ·) and we now give the general definition.

Definition: Relaxed Control. An admissible relaxed control or simply a


relaxed control m( ·) is a measure on B(U x [0, oo)) such that m(U x [0, t]) = t
for all t. Given a relaxed control m(·), there is a derivative mtO such that
m(dadt) = mt(da)dt. In fact, we can define the derivative by
. m(A x [t - c5, t])
(A) -_l i1D
ffit > 0

<5-tO u

Example. Let ai E U, i = 1, 2, and let mt(·) be the measure which takes


the value 1/2 at each of the points ai. Then the ODE in (6.3) can be
written as:

x0
= 1
-[b(x,a1)
2
+ b(x,a2)] = 1
u
b(x,a)mt(da). (6.4)

With the use of a relaxed control, the set of possible velocities and cost
rates (b(x,U),k(x,U)) is replaced by its convex hull. The relaxed control
m(·) for which W(O, m) = 0 and x(t) = 0 in (6.1) is the one for which mt(·)
is concentrated on the points ±1 each with mass 1/2 for all t. •

Remarks on Relaxed Controls. To see why relaxed controls are needed


when working with a sequence of controls and corresponding solutions to
(5.1a), consider the following example. Let un(t) = (-1)[ntJ, where [nt]
denotes the integer part of nt, and let mno denote its relaxed control
representation, as in (6.2). That is, the measure mf(·) is concentrated at
the point u n ( t). The sequence of controls {u n 0, n < oo} does not converge
in any of the usual senses. But, the sequence {mnO, n < oo} converges in
the "weak" sense; i.e., for any bounded and continuous function ¢0 and

L L
t < oo,
1t ¢(a, s)mn(dads)-+ 1t ¢(a, s)m(dads), (6.5)

where m(·) is the relaxed control with derivative mt(-1) = mt(1) = 1/2.
If xn(-) is the solution to (5.1a) under control un(-), then xn(-) converges
88 4. Markov Chain Approximation Method

to x(·) which satisfies (6.4) with a 1 = 1 and a 2 = -1. In fact, for this
example, mn([O, t] x {+1}) is just the total part of the time interval [0, t]
on which un(s) = 1, and it equals t/2 + 0(1/n).
For any sequence of relaxed admissible controls, there is always a subse-
quence which converges in the sense of (6.5). The introduction of relaxed
controls has the effect of making the control appear essentially linearly in
the dynamics and cost function.
We saw in an example above that there might not be an optimal control
in the class of ordinary admissible controls. But there always is one in the
class of relaxed controls. More detail is in Chapters 9 and 10. The following
approximation result is important in applications because it says that any
admissible relaxed control can be well approximated by a "nice" ordinary
admissible control.

Approximation Theorem: The "Chattering" Theorem. Suppose the


ODE in {6.3} has a unique solution under some given relaxed control m(·),
and let k( ·) and b( ·) be bounded and continuous. Then for any € > 0 and
T < oo, there is a 8 > 0, a finite set Ue C U, and a Ue-valued ordinary
control ue ( ·) which is constant on the intervals [i8, i8 + 8) and is such that

sup lx(t, m)- x(t, ue)l ~ €,


t~T (6.6)
IW(x,m)- W(x,ue)l ~f.

Here x( t, m) is the solution under the denoted control m( ·).

Completion of the Argument of Section 4.5. Even if the sequence of


controls uh(·) used in Section 4.4 does not have a convergent subsequence
for each w, the sequence of its relaxed control representations m h ( ·) always
will. Furthermore any relaxed control (optimal or €-optimal) can be ar-
bitrarily well approximated by a piecewise constant ordinary control. In
addition, the infimum of the cost over the classes of ordinary and relaxed
controls are the same. With these points in mind, we can complete the
proof of convergence in Section 4.5 in general.
5
Construction of the Approximating
Markov Chain

In this chapter we develop some canonical methods for obtaining approx-


imating Markov chains which are locally consistent with the controlled
diffusion in the sense used in (4.1.3). We also deal with the controlled jump
diffusion and reflected processes in Sections 5.6 and 5.7. The chapter out-
lines two classes of basic methods and a number of variations of each of
them. One purpose is to describe methods which are readily programmable.
But we also wish to show the versatility and intuitive nature of the gen-
eral approach. There are many variations of the methods discussed. Once
the general procedures are clear, the reader can adapt them to particular
problems which might not be directly covered by the discussion. The devel-
opment does not directly cover processes on manifolds such as the surface
of a sphere or torus, but the various possibilities should be apparent.
The first method to be discussed is called a "finite difference" method,
although its validity does not depend on the validity of any finite differ-
ence approach to the solution of partial differential equations. The finite
difference approximations are used as guides to the construction of locally
consistent approximating Markov chains only. It turns out that when a
carefully chosen finite difference approximation is applied to the differ-
ential operator of the controlled process, the coefficients of the resulting
discrete equation can serve as the desired transition probabilities and in-
terpolation interval. Once these are available, we use purely probabilistic
methods for dealing with them, because the partial differential equations
which are approximated might not have the smoothness which is required
for validity of the classical finite difference approach. Also, once the tran-
sition probabilities are obtained with this method, they can be altered in
90 5. The Approximating Markov Chains

many ways, according to numerical convenience, while keeping consistency.


In order to motivate the method, we start by discussing several special ex-
amples in Section 5.1. Various numerical implications and variations of the
choices are discussed at the end of the section and in Section 5.2. The gen-
eral "finite difference" method is described in Section 5.3. The method, as
developed in its simplest form, sometimes fails if the off diagonal terms of
the noise covariance matrix are too large. Several methods for overcoming
this problem are discussed in Section 5.3. The notion of local consistency
of an approximating chain is analogous to the notion of consistency for a
finite difference approximation, except that here it is the controlled process
which is being approximated, and not a partial differential equation.
Section 5.4 describes a very versatile "direct" method for getting good
approximating chains, essentially by decomposing the "local" effects of the
"drift" and "noise" parts of the original process. It is a generalization of
the approach of Sections 5.1 to 5.3. Several illustrations show how the state
space and transition functions of the approximating chains can be tailored
to the particular problem.
Section 5.5 contains some introductory comments concerning variable
grids. The spacing of the points in the approximating chain might vary
from one region of the state space to another because we need greater ac-
curacy in certain regions than in others. Because of this, we might lose
local consistency on the boundaries of the regions. A particular example of
a "variable grid" and the lack of local consistency is discussed. The proba-
bilistic approach to the convergence proofs works when the approximated
process spends little time in a neighborhood of the points where we lack
consistency. This covers many cases of interest. The example illustrates the
versatility of the method. The approximating chain for the jump diffusion
process is discussed in Section 5.6. The approximating chains are derived
by simply piecing together in an obvious way the approximation for the
diffusion alone with an approximation to the jumps. Approximations for
reflecting boundaries are dealt with in Section 5.7. The Skorokhod Prob-
lem model for a reflecting jump diffusion is used. The essential requirement
for the approximating chain on the reflecting boundary is that the mean
direction of motion be an approximation to a reflection direction for the
original problem. In Section 5.8, the dynamic programming equations for
the problems of Chapter 2 are written in the notation of the approximating
Markov chains for use in subsequent chapters.
If the variance is controlled or is highly state dependent, there is no
problem with the proofs of convergence (see Chapter 13), but more care
needs to be exercised in the construction of the algorithms. It might be
impossible to get locally consistent approximating chains with only local
transitions. The coding becomes more complex and often one needs to strike
a balance between complexity of coding and a small level of "numerical
noise." Such issues are discussed in Section 5.9 and in [99]. State dependent
and controlled variance is a common occurrence in models originating in
5.1 One Dimensional Examples 91

financial mathematics.

5.1 Finite Difference Type Approximations: One


Dimensional Examples
In this and in the next three sections, several convenient methods for get-
ting locally consistent approximating chains will be demonstrated. As with
any method of numerical approximation, once the general concepts are
understood one can use ingenuity to exploit the unique features of each
special case. It is useful to start with a discussion of the relation between
the classical finite difference approximation method for elliptic PDE's and
the Markov chain approximation method, so that we can see that we are
actually on familiar ground. It will be seen that an essentially standard use
of a finite difference approximation of the differential operator of the con-
trolled process will yield the transition probabilities and interpolation in-
terval as the coefficients in the finite difference representation. The general
idea of the approach via a finite difference method can be best introduced
by means of a few simple one dimensional examples. These will illustrate
the essentially automatic nature of the generation of the transition proba-
bilities ph ( x, y Ia) of the chain and the interpolation intervals ~th (x, a). In
these motivating examples, all the stochastic and partial differential equa-
tions and the uses of Ito's formula will be dealt with in a purely formal
way. We are only concerned with getting a locally consistent chain. Once
the formalities of the "finite difference" derivation are over, it can be readily
verified that the derived chain satisfies the required consistency properties.
Recall that the controlled chain {~~, n < oo} is said to be locally consis-
tent with the controlled diffusion process

dx = b(x, u)dt + a(x)dw (1.1)

if (4.1.3) holds for interpolation intervals satisfying supx a ~th(x, a) --+ 0,


as h --+ 0. For these one dimensional examples, the pr~cess x( ·) will be
of interest on the interval G = [0, B], B > 0. It is not really necessary to
specify the interval, but it might help to fix ideas for the actual control
problem. We will use h > 0 as the approximation parameter, and suppose
that B is an integral multiple of h for all values of h of interest. Define
the sets Sh = IRh = {0, ±h, ±2h, ... }, and G~ = G0 n Sh, where G0 is the
interior of G.

Example 1. An Uncontrolled Wiener Process. Define the uncon-


trolled process x(t) = x + aw(t), where w(·) is a standard real valued
Wiener process, x is the initial condition and a is a constant. Define the
92 5. The Approximating Markov Chains

first exit timeT= min{t: x(t) ¢. (O,B)} and the cost functional

W(x) =Ex for k(x(s))ds,


where k( ·) is a bounded and continuous function. This function, as well as
the interval G, play only an auxiliary and formal role in the development.
Their values are not important. By Ito's formula, ifW(·) is smooth enough,
then it satisfies the differential equation {Section 3.3)

.CW(x) + k(x) = 0, x E {0, B), {1.2)

W(O) = W(B) = 0,
where .C = (a 2 /2){d2 fdx 2 ) is the differential operator of the process x(·).
We will obtain the desired transition probabilities and interpolation in-
terval simply by trying to solve the differential equation {1.2) by finite
differences. The standard approximation

f (X ) --+ f(x +h)+ f(x- h) - 2f(x)


XX h2 {1.3)

for the second derivative will be used. Now use {1.3) in {1.2), denote the
result by Wh(x), and get (for x EGg)

a 2 [Wh(x +h)+ Wh(x- h) - 2Wh(x)] k( ) = 0


2 ~ + X '

which can be rewritten as


1 1 h2
Wh(x) = 2wh(x +h)+ 2wh(x- h)+ a 2 k(x), x EGg. {1.4a)

The boundary conditions are


{1.4b)

Equation {1.4) has a simple interpretation in terms of a Markov chain


(recall a similar problem in Section 4.4 ). Let {e~, n < oo} be the symmetric
random walk on the state space Sh and define f:l.th = h2 / a 2 • The transition
probabilities of the random walk are ph(x, x ±h) = 1/2. In terms of these
transition probabilities, we can rewrite {1.4a) as

Wh(x) = ph(x,x+h)Wh(x+h)+ph(x,x-h)Wh(x-h)+k(x)l:l.th, {1.5)

x E Gg. Using N h for the first escape time of the chain from the set Gg,
the solution of {1.5) can be written as {see Chapter 2) a functional of the
path of the chain in the following way:
Nh-l
Wh(x) =Ex L k(e~)l:l.th. {1.6)
n=O
5.1 One Dimensional Examples 93

Note that E~a~~ = 0 and the conditional covariance is E~(a~~) 2 = h2 =


u 2 ath. Thus, the chain is locally consistent with the Wiener process, in the
sense of (4.1.3). The use of a random walk to approximate a Wiener process
is well known. The continuous time interpolations ~h(·) (Section 4.2) and
t/Jh(·) (Section 4.3) both converge in distribution {Chapters 9 and 10) to a
Wiener process. In fact, Theorem 10.5.1 implies that Wh(x) -t W(x). The
convergence result is also a consequence of the Donsker invariance principle
[52]. The finite difference approximation of the differential operator C of
the Wiener process gives the transition probabilities and the interpolation
interval immediately, whether or not W(x) actually satisfies {1.2). (In this
particular case, {1.2) does hold.] The partial differential equation (1.2) and
the function k( ·) only served the formal and auxiliary purpose of being
a vehicle for the calculation of the transition probabilities and the inter-
polation interval. The transition probabilities were found by choosing an
appropriate finite difference approximation to (1.2), collecting terms, and
dividing all terms in the resulting expression by the coefficient of Wh (x).

Example 2. An Uncontrolled Deterministic Case. This example uses


an uncontrolled ordinary differential equation model. Consider the system
:i; = b(x), where x is real valued and b{·) is bounded and continuous.
First, suppose that inflb(x)l # 0. As in Example 1, define the functional
J;
X

W(x) = k(x(s))ds. Formally, W(·) solves the ODE {1.2), where now
C = b(x)djdx is just a first order differential operator. In particular, if
W ( ·) is smooth enough we have
Wx(x)b(x) + k(x) = 0, x E (0, B),
(1.7)
W(O) = W(B) = 0.
Owing to the unique direction of flow for each initial condition, only one
of the two boundary conditions is relevant for all x E (0, B). As was done
with Example 1, the Markov chain approximation to x(·) will be obtained
by use of a finite difference approximation to the derivative in (1.7). But we
will need to choose the difference approximation to the derivative Wx(x)
carefully, if the finite difference equation is to have an interpretation in
terms of a Markov chain. Define the one sided difference approximations:
f(x +h) - f(x)
fx(x) -+ h if b(x) ~ 0,
(1.8)
f(x)- f(x- h)
fx(x) -+ if b(x) < 0.
h
That is, if the velocity at a point is nonnegative, then use the forward
difference, and if the velocity at a point is negative, then use the backward
difference.
Such schemes are known as the "upwind" approximation method in nu-
merical analysis. Define the positive and negative parts of a real number
94 5. The Approximating Markov Chains

by: a+= max[a,O], a-= max[-a,O]. Using (1.8) in (1.7) yields

Wh(x + hl- Wh(x) b+(x)- Wh(x)- :h(x- h) b-(x) + k(x) = 0 (1.9)

for x E (0, B) and with boundary conditions Wh(O) = Wh(B) = 0. Define


the functions

ph(x, X+ h) = I{b(x)?:O}• ph(x, X- h) = I{b(z)<O}

and the interpolation interval ~th(x) = hflb(x)J. The ph(x,x ± h) are


transition probabilities for a Markov chain on the state space Sh. Now,
collecting terms in (1.9), noting that b+(x) + b-(x) = Jb(x)J, and dividing
by the coefficient of Wh(x) yields, for x E G~,

h b+(x) h b-(x) h
= W (x +h) lb(x)l +W (x- h) Jb(x)l + k(x) lb(x)l
= Wh(x + h)ph(x, x +h)+ Wh(x- h)ph(x, x- h) (1.10)

+ k(x)~th(x).
The ph(x,y) define a Markov chain {e~, n < oo} on Sh. If infx Jb(x)J #
0, then the chain together with the interpolation interval hflb(x)l is lo-
cally consistent with the "process" defined by the solution to :i; = b(x) in
that (4.1.3) holds. In particular, E~,n~e~ = b(x)~th(x) and cov~,n~e~ =
O(h2 ) = o(~th(x)).

A Modification if inf:z: Jb(a:)l = 0. If infx Jb(x)l = 0, then the above


calculation breaks down in that ~th(x) = oo at some point x. Because that
state is absorbing anyway, the infinite value is not surprising. If we wish to
use interpolation intervals which go to zero as h ---+ 0, the degeneracy can
be circumvented by simply allowing transitions of the states of the chain to
themselves. To see how this can be done, let 0::; ph(x,x)::; 1 and rewrite
(1.10) as

h h h b+(x) h
= p (x, x)W (x) + (1- p (x, x)) Jb(x)l W (x +h)
h b-(x) h h h
+ (1- p (x,x)) Jb(x)i W (x- h)+ (1- p (x,x))~t (x)k(x).
{1.10')
Define the new transition probabilities

and the new interpolation interval ~th(x) = (1 - ph(x, x))hflb(x)J. One


can now readily choose the ph (x, x) to get a locally consistent transition
5.1 One Dimensional Examples 95

probability and interpolation interval. For example, if b(x) = 0, then set


ph(x,x) = 1 and Ath(x) =h. Note that {1.10} and {1.10') have the same
solution if ph(x,x) < 1 for all x.

The Solution to (1.10'). In order to facilitate writing the solution to


{1.10'}, suppose that there is a point xo such that for x > xo, we have
b(x) > 0, and for x < xo we have b(x) < 0, and b(xo) = 0. Then, in terms
of{~~' n < oo} the solution to {1.10'} can be written as
Nh-1
Wh(x) =Ex L k(~~)Ath(~~),
n=O

where Nh is again the first escape time of the chain from G~. If xo E {0, B),
then the process might get stuck at xo with value Wh(xo) = 0 if k(xo) = 0,
and Wh(x 0 ) = ±oo otherwise, according to the sign of k(x0 ). It turns out
that Wh(x)---+ W(x), and the interpolated processes eh(·) and 1/lh(·) both
converge to the solution to x = b( x). Thus, the finite difference method
automatically gives a chain which satisfies the consistency conditions and
can be used to approximate the original process as well as functionals of it.

Remark on the Choice of Finite Difference Approximation (1.8)


and on 4th ( x ). Note that the choice of the finite difference approximation
depends on the sign of b(x). I.e., the "direction" of the approximation
depends on the sign of b(x) because we want the direction of movement
of the chain at a point x to reflect the sign of the velocity at x. Such a
choice is not unusual in numerical analysis, and will be commented on in
Examples 3 and 4 also. Note also that the Ath(x) is just the inverse of
the absolute value of the velocity at x times the spatial difference interval.
Thus, in time Ath (x) the continuous time interpolations t;,h ( ·) and 1/Jh ( ·)
move an average of h units.

Example 3. A Diffusion With Drift: No Control. Now we combine


the two cases of Examples 1 and 2. Let x(·) be the process which satisfies
dx = b(x)dt + a(x)dw, x(O) = x, (1.11)
where we assume that b( ·) and a(·) are bounded and continuous and

inf(a 2 (x)
X
+ lb(x)l) > 0.
If this last restriction does not hold, then we can continue by adding tran-
sitions from appropriate states x to themselves as discussed in Example
2. Let W(x) be defined as in Example 1. Then if W(·) is smooth enough,
Ito's formula implies (1.2), where Cis the differential operator of the pro-
cess (1.11). In particular,

Wx(x)b(x) + Wxx(x)a 2 (x)j2 + k(x) = 0, x E (0, B), (1.12)


96 5. The Approximating Markov Chains

W(O) = W(B) = 0.
Use the finite difference approximations (1.3) and (1.8), and again let
Wh(x) denote the finite difference approximation. [A possible alternative
to (1.8) is given in (1.18) below.] Substituting these approximations into
(1.12), collecting terms, multiplying by h 2, and dividing all terms by the
coefficient of Wh(x) yields the approximating equation

Wh(x) = a 2(x)/2 + hb+(x) Wh(x +h)+ a 2(x)/2 + hb-(x) Wh(x _h)


a2(x) + hlb(x)l a 2(x) + hlb(x)l
h2
+ k(x) a2(x) + hlb(x)l'
(1.13)
Let us rewrite this as
Wh(x) = ph(x, x + h)Wh(x +h)+ ph(x,x- h)Wh(x- h)+ k(x)Llth(x),
(1.14)
where the ph and flth are defined in the obvious manner. Let ph(x, y) = 0
for y f= x ± h. The ph (x, y) are nonnegative and sum (over y) to unity for
each x. Thus, they can be considered to be transition probabilities for a
Markov chain on the state space Sh. Let {e~, n < oo} denote the chain,
and let us check for local consistency. We have

Eh aeh = h a2(x)/2 + hb+(x) - h a2(x)/2 + hb-(x) = b(x)ilth(x)


x,n n a2(x) + hlb(x)l a2(x) + hlb(x)l '
(1.15)

Eh (Lleh)2 = h2(u2(x)f2 + hb+(x) + u 2 (x)/2 + hb-(x))


x,n n u2(x) + hlb(x)l u2(x) + hlb(x)l {1.16)
= u 2(x)t1th(x) + o(ilth(x)).
Also, E~,n[ae~ -E~,nae~J2 has the representation (1.16) where o(flth(x))
= flth(x)O(h). Thus the chain (with the given interpolation interval) is
locally consistent with the process defined by (1.11). Again the finite dif-
ference approximations were used only in a mechanical way to get the
transition probabilities and interpolation interval for a locally consistent
approximating chain.
Note that the interpolation interval flth(x) depends on both the drift and
the diffusion coefficients. As the magnitudes of these coefficients increase,
the interval decreases in size. The relative effects of the two coefficients on
the interpolation interval is affected by the two natural time scales: The
time scale for the pure drift case (Example 2) is O(h), and that for the
pure diffusion case (Example 1) is O(h2 ).

A Central Difference Approximation. If the diffusion term always


"dominates" the drift term for the values h of interest, then the one sided
5.1 One Dimensional Examples 97

difference approximations (1.8) can be replaced by a two sided or symmetric


finite difference approximation. This is preferable, whenever possible, and
yields smaller errors [127]. Suppose that

inf[u 2 (x)- hlb(x)IJ 2:: 0 (1.17)


X

for all h of interest, and use the symmetric difference approximation

f x (X ) --t f(x +h) - f(x- h)


2h (1.18)

for the first derivative in lieu of (1.8). Then repeating the procedure which
led to {1.14) yields the following finite difference equation and new transi-
tion probabilities and interpolation interval.

Wh(x)
= u 2(x) + hb(x) Wh( h) u 2(x)- hb(x) Wh( _h) k( )~
2u2 (x) x + + 2u 2 (x) x + x u2(x)
= ph(x,x + h)Wh(x +h)+ ph(x,x- h)Wh(x- h)+ k(x)~th(x).
(1.19)
Local consistency can be shown as in (1.15) and (1.16).
If (1.17) does not hold, then the ph(x,x ±h) in (1.18) are not all non-
negative. Thus, they cannot serve as transition probabilities and, in fact,
serious instability problems might arise.

Example 4. A One Dimensional Example with Control. Now turn


to a discussion of a controlled one dimensional diffusion. Let the control
u(·) be of the feedback type, with values u(x) in U, a compact control set.
Let x( ·) be defined by

dx = b(x, u(x))dt + u(x)dw. (1.20)

Again, let 7 = min{t: x(t) ¢ (0, B)} and define the cost function

W(x, u) = E; [1r k(x(s), u(x(s)))ds + g(x(r))] ,

W(x, u) = g(x), for x = 0, B.


As in the previous examples, the functions k(·) and g(·) and the interval
[0, B] play a purely auxiliary role. Formally applying Ito's formula to the
function W(x,u) yields the equation (see Section 3.2)

.cu(x>w(x,u) + k(x,u(x)) = 0, x E (O,B), (1.21)

with boundary conditions W(O, u) = g(O), W(B, u) = g(B), where .co: is


the differential operator of x( ·) when the control is fixed at a.
98 5. The Approximating Markov Chains

To get a locally consistent Markov chain and interpolation interval, sim-


ply follow the procedure used in Example 3; namely, use the finite difference
approximations (1.3) and (1.8) for the Wxx(x, a) and Wx(x, a) in (1.21).
[(1.18) should be used, if possible.] Define

h u 2(x)f2 + hb+(x,a)
p (x,x+hia)= u2(x)+hib(x,a)i'

h( _hi ) _ u 2(x)f2 + hb-(x,a)


(1.22)
p x,x a - u 2(x)+hib(x,a)i '

For y -=f. x ± h, set ph(x, yja) = 0. Then the constructed ph are transition
probabilities for a controlled Markov chain. Local consistency of this chain
and interpolation interval can be shown exactly as for Example 3. Also, by
following the procedure which led to (1.13), we see that the formal finite
difference form of (1.21) is just

Wh(x, u) = :L>h(x, yju(x))Wh(y, u) + k(x, u(x))ath(x, u(x)) (1.23)


y

for x E G~ and with the same boundary conditions as W(x, u) satisfies.


If (1.23) has a unique solution, then it is the cost associated with the
controlled chain, namely,

Nh-1
wh(x,u) = E~ I: k(e:,u(e:nath(e:,t~-(e:)) + E~g(et).
0

The dynamic programming equation for the optimal value function is

( l
v• (x) ~ :!'.ill [~>· x, yia )V•(y) + k( x, ")t.t• (x, ") (1.24)

with the same boundary conditions as for W(x, u).


We emphasize that no claim is made that the convergence of the finite
difference approximations can be proved via the classical methods of nu-
merical analysis. The finite difference approximation is used only to get the
transition probabilities of a Markov chain which is locally consistent with
(1.20).
5.2 Numerical Simplifications 99

5.2 Numerical Simplifications and Alternatives for


Example 4
5.2.1 Eliminating the control dependence in the denominators
of ph(x, yla) and ~th(x, a)
The possible presence of the control parameter a in the denominators of
the expressions for the transition probabilities and interpolation interval
in {1.22) will complicate getting the solution of {1.24) if it complicates the
procedure needed to evaluate the minimum on the right hand side. All of the
current methods used to solve {1.24) require the computation of minima
analogous to the right hand side of {1.24). The iteration in policy space
method, described in Chapter 6 and also briefly below, is one generally
popular approach. This method requires the solution of the sequence of
equations for the costs associated with a "minimizing" sequence of controls,
and each control is determined via an operation similar to the minimization
on the right side of (1.24).
The ease of solution of the approximating equations, which are (1.23)
and (1.24) in our case, is a key concern in deciding on the approximating
chain. For any particular problem, there are many alternative locally consis-
tent approximating chains which can be us~d, some being more convenient
than others. One has to weigh the work needed to obtain the approximat-
ing chain against the amount of work needed to actually solve {1.23) or
(1.24). Given the transition probabilities for any locally consistent chain,
one can often simplify them from the computational point of view, while
maintaining local consistency. Getting a useful chain is often just a matter
of common sense. The intuition of numerical analysis is combined with the
intuition obtained from the physical interpretation of the Markov chain
approximations. We will illustrate some useful methods for simplifying the
dependence of the transition probabilities and interpolation interval on the
control. The remarks in this section deal with the Example 4 of the last
section but are of general applicability.
One simple way of eliminating the dependence of the denominators in
the expressions for ph(x, yia) and ~th(x, a) on the control parameter a
starts with (1.22) and proceeds as follows. Define the functions B(x) =
maxaEU lb(x,a)l and define

-h( ±hi ) = u2 (x)/2 + hb±(x,a) (2.1)


p x, x a u 2 (x) + hB(x) '

h2
~fh(x) = u2(x) + hB(x) ·

The ph(x, yia) might sum (over y) to less than unity for some x and a. To
100 5. The Approximating Markov Chains

handle this, we define the "residual"

ph(x,x!a) = 1- ""ph(x, yia) = h(B(x) -lb(x, a )I). {2.2)


L.....J u 2 (x) + hB(x)
y#x

It can be readily shown that the chain associated with the transition prob-
abilities ph(x, yia) and interpolation interval ~fh(x) is locally consistent
with {1.20). The difference between the barred and the unbarred values
is O{h). The symmetric finite difference approximation (1.18) to the first
derivative can also be used if

min[u 2 (x)- hB(x)] ~ 0.


X

It will be seen below that the use of (2.1) and (2.2) is equivalent to the use
of {1.22). They yield the same cost functions and minimum cost.

Controlled Variance. So far, we have not considered cases where the


variance u(·) might depend on the control parameter. The mathematics re-
quired for the convergence proofs is somewhat more complicated when there
is such dependence. See Chapter 13 and [94]. The numerical algorithms are
obtained in the same way as when only the drift b( ·) is controlled. When
the variance depends on the control, it is often necessary to carefully ex-
amine the computational complexity of taking minima in {1.24) or in (2.7)
below. The method described above can also be used in this case, as follows
by simply defining the new transition probabilities and transition intervals
(using central differences, if possible):

-h( ±hi ) = u 2(x, a)/2 + hb±(x, a)


p x,x a Dh(x) '

-h( I ) _ ""-h( I ) _ Dh(x)- [u 2 Dh(x)


p x,xa- 1 -L.....JP x,ya-
(x, a)+ hlb(x, a)i]
,
y#x
h2
~fh(x) = Dh(x).
The discussion is continued in Section 5.9.

5.2.2 A useful normalization if ph(x, xia) =f 0


The numerical methods for (1.23) and (1.24) generally converge faster if
ph(x, xi a) = 0. From an intuitive point of view, this is due to the obser-
vation that the positivity of the ~(x,xia) implies that the Markov chain
"mixes" more slowly. Loosely speaking, the faster the mixing, the faster
5.2 Numerical Simplifications 101

the convergence. Suppose that the -ph(x, yia) defined in (2.1) and (2.2) are
used for the ph(x, yia) in (1.23) or (1.24). We have ph(x, xi a) = O(h) i= 0.
The difference from zero is not large, but numerical studies suggest that
even in this case there is an advantage to eliminating the transition of a
state to itself. When approximating deterministic optimal control problems
the improvement can be very substantial (see Section 15.4). Let us see how
this can be done.

The Equivalence of ((2.1), (2.2)) and (1.22). Let Wh(x, u) denote


the cost under (2.1) and (2.2). Then

Wh(x, u) = LPh(x, yiu(x))Wh(y, u) + ph(x, xiu(x))Wh(x, u)


yf.x
+ k(x, u(x))~fh(x, u(x)),
(2.3)
which can be written as

-h Ly¥:z:Ph (x,yiu(x))W
-h
(y,u) + k(x,u(x))~t~ (x,u(x))
W (x,u)= h '
1- p (x,xiu(x))
or, equivalently,

Wh(x,u) = LPh(x,yiu(x))Wh(y,u) + k(x,u(x))~th(x,u(x)). (2.4)


yf.x

Comparing (2.4) to (1.23), we see that Wh(x, u) = Wh(x, u) for all feedback
controls u(·) for which (1.23) or (2.4) has a unique solution. Thus one can
use either equation with the same results. This procedure for eliminating
the transitions from any state to itself is called normalization.
These observations apply to any situation where we start with a tran-
sition probability under which some states communicate to themselves.
Just to complete the picture, suppose that we are given a locally consistent
chain with transition probabilities and interpolation interval ph(x, yia) and
~fh(x, a), respectively, where ph(x,xla) might be positive for some x and
a. Let Wh(x, u) denote the associated value of the cost. Then

Wh(x, u) = L ph(x, yiu(x))Wh(y, u) +ph(x, xiu(x))Wh(x, u)


yf.x
+ k(x, u(x))~th(x, u(x)).
(2.3')
For x i= y, define

ph(x, yla)
1- jjh(x,xla)'
,.,_ (2.5)
~t (x,a)
1- jjh(x,xia)'
102 5. The Approximating Markov Chains

and set ph(x, xia) = 0. The chain with this new transition probability and
interpolation interval obviously yields the same cost as does the original
chain [under the same control u( ·)] if the solution to (2.3') is unique, because
by construction of the ph(x, yia) and L).t'(x), the equations for the cost are
the same in both cases. It is readily verified that (2.5) yields a locally
consistent chain.

A Comment on Normalization for the Discounted Cost Problem.


If there is a discount rate {3 > 0, then we need to be more careful with
the use of the type of transformations just discussed, because the discount
factor also depends on the interpolation interval. The transformed tran-
sition probabilities (as in (2.5)) will still give a locally consistent Markov
chain, of course, because consistency does not depend on whether or not
there is discounting. But, if the cost is discounted, use the discount factor
exp -{3i).fh(x, a) (or an appropriate approximation) which corresponds to
the new interpolation interval i).th(x, a). Henceforth in this chapter, unless
otherwise mentioned, the discount factor will be dropped.

The Approximation in Policy Space Method. Introduction. Many


computational methods for solving (1.24) are versions of the so-called ap-
proximation in policy space technique. (See Chapter 6 for more detail.)
With this method, one sequentially computes a ''minimizing" sequence of
feed back control policies {Un (·)} for the chain, and the control Un+ 1 ( ·) is
obtained from an approximation to the solution to the equation for the cost
under unO· Suppose that a "tentative" optimal control for the controlled
chain and given cost is Un(·), and we wish to compute Un+l(·). To do this
we can work with either the original ph(x, yia) or the form (2.1), (2.2) (to
eliminate the control dependence of the denominator). Suppose that we
are given locally consistent transition probabilities ph (x, yla) and interpo-
lation interval i).th(x, a) where ph(x, xia) = 0, and compute ph(x, yia) and
L).fh(x, a) from them as in (2.1) and (2.2). The cost function Wh(x, u) sat-
isfies (1.23) for the original transition probabilities and (2.6) for the "bar"
system:

Wh(x, Un) = LPh(x, yiun(x))Wh(y, Un) + k(x, Un(x))L).~(x, Un(x))


y
= ph(x,xlun(x))Wh(x,un) + LPh(x,yiun(x))Wh(y,un)
y#x
+ k(x, Un(x))L).~(x, un(x))
(2.6)
for x E G~ with the boundary conditions Wh(x, un) = g(x), for x E {0, B}.
In fact, equation (2.6) is equivalent to (1.23) for u(·) = un(-), where the
original ph(x,yia) and i).th(x,a) are used, in that the solutions are the
same. Thus, we could use ph(x, yia) and i).th(x, a) in (2.6) and get the same
solution. In order to calculate the next control, one gets an approximate
5.2 Numerical Simplifications 103

value for Wh(·,un) and then calculates Un+l(·) from either

or

._+I (z) ~ arg~ [~)"(x, yla)W'(y, ._) +k(x, a)ai"(x, a)] (2. 7b)

depending on computational convenience. It is sometimes numerically sim-


pler to use the (ph(x,yia),Aft(x,a)) because the denominators of these
expressions do not depend on a. The sequence of minimizing controls might
not be the same for the two cases. But owing to the local consistency of
both approximations, the asymptotic results (h--+ 0) will be the same.

5.2.3 Alternative Markov chain approximations for Example


4 of Section 5.1: Splitting the operator
The general procedure outlined in Example 4 has many variations, some
being more convenient for particular cases. Some of the issues involved in
the choice of the chain can be seen via a concrete example. We will go
through some ''typical" calculations in order to illustrate some of the pos-
sibilities for exploiting the structure of particular problems, concentrating
on the one dimensional form

dx = (b(x) + u(x))dt + <1(x)dw, iu(x)i ~ 1,


(2.8)
k(x,a) = k(x) +kola!, ko > 0,

where the functions are appropriately bounded and smooth. Similar consid-
erations work for the general r-dimensional problem. First try the central
difference based rule:

h( ±hi ) _ <1 2 (x) ± h(b(x) +a)


p x,x a - 20' 2 •

If this can't be used, then try a compromise between the central difference
and some one sided form; e.g.,

h( ±hi ) _ <1 2 (x) ±a+ hib(x)i±


P x,x a - 2<12 + hib(x)i .

Otherwise, use the procedure (1.2) to get


±
- <1 (x)/2 + h(b(x) +a) ,
2 -
ph( x,x ±hi a ) _
<1 2 (x) + hib(x) + ai
104 5. The Approximating Markov Chains

which is an awkward expression if one wishes to use it in a minimization


such as in (2.7).
We will show how to simplify it via a simple adaptation of the finite
difference procedure of Example 4 of Section 5.1. First rewrite £ 0 f(x)
in the "split up" form, where the two terms in which the first derivative
appears are separated:

u 2 (x)
L0 f(x) = fx(x)b(x) + fx(x)a + - 2-fxx(x). (2.9)

Each of the two terms involving the first derivative will be approximated
separately. We will use the difference approximation

_ { b(x/(x + hh- f(x) for b(x) ~ 0


(2.10a)
fx(x)b(x) ---+ b(x/(x)- {(x- h)
for b(x) < 0,

f(x+h)-f(x)
fora~O
{ a h
(2.10b)
fx(x)a---+ J(x) - f(x- h)
a h for a< 0.
Following the procedure of Example 4, but using (2.10) for the first deriva-
tive terms, yields the expression (1.23) but with the following new defini-
tions of the transition probabilities and interpolation interval: Define the
normalization Qh(x, a) = u 2(x) + hlal + hlb(x)l and

u 2 (x)/2 + hb±(x) + ha±


Qh(x,a) (2.11)
h2
=
Qh(x,a)"

The chain defined by (2.11) is locally consistent with the controlled process
defined by (2.8).
The transition probabilities and interpolation interval in (2.11) differ
from (1.22) by having the absolute values of a and b(x) separated in the
denominators and (a + b( x)) ± being replaced by a± + b( x )± in the numer-
ators, an advantage from the coding point of view. The denominators still
have an a-dependence. This can be eliminated via the method which led to
(2.1), (2.2). To do this, define the maximum of Qh(x, a) over a E U = [0, 1),
namely, Qh(x) = u 2 (x) + h + hlb(x)l. Then use

ph(x, x ± hla) (u 2 (x)/2 + hb±(x) + ha±)/Qh(x),


ph(x,xla) = h(1 -lal)jQh(x), (2.12)
~fl(x) h2 /Qh(x).
5.2 Numerical Simplifications 105

The chain defined by (2.12) is locally consistent with the controlled diffusion
(2.8).

Solving for the Minimum in (2. 7). The form of the transition proba-
bilities given in (2.12) is convenient for getting the minimum in expressions
such as (2.7b), and we will go through the calculation for our special case.
The expression (2.7b) is equivalent to:

Un+l(x) = arg min [hWh(x + h, Un)a+ + hWh(x- h, Un)a-


iai9
+ koiaih 2 - hWh(x, Un)iaiJ.
It is easily seen that the minimizing a will take one of the values { -1, 0, 1}.
Suppose that Wh(x + h, un) ::; Wh(x- h, Un)· Then the minimizing a will
be nonnegative and the Wh(x- h, un) can be dropped from the expression.
Otherwise, it will be nonpositive and the Wh(x + h, un) can be dropped.
In fact, for the first case, if

then we can use Un+ 1 (x) = 1 for the minimizing control value. In general,
we can conclude that

otherwise, Un+l (x) = 0.


Numerical minimization can be quite time consuming, and should be
avoided if possible. Thus, it is important to keep in mind that the minimum
in (2.7) should be solved for explicitly if at all possible. Also, in an actual
program one writes the simplest decision tree for the choices above, and
division by h is not required.

The Symmetric Difference Approximation for the First Deriva-


tive. If
u2 (x)- h- hib(x)i 2:: 0,
then the symmetric finite difference approximation (1.18) can be used for
the first derivatives in (2.9) at x, and the calculations are simpler, because
the denominators of the ph(x, yia) and 6.th(x, a) will not then depend on
a. If the above maximum is negative, then we still might be able to use the
106 5. The Approximating Markov Chains

symmetric finite difference for one of the first derivative terms, as follows.
Suppose that a 2 (x)- h ~ 0. Then use the one sided finite difference for
the representation of Wx(x, u)b(x) and the symmetric finite difference for
the approximation of Wx(x, u)a at x. With the use of this approximation,
we still get a Markov chain which is locally consistent with {2.8), and the
calculation of the infima in {2. 7) is still relatively simple. The transition
probabilities are given by
ph(x,x ± hia) = [a 2 (x)/2 + hb(x)± ± ha/2]/[a 2 (x) + hlb(x)l].
The other ph(x, yia) are zero, and

~th(x) = h2 j[a 2 (x) + hlb(x)IJ.


Sometimes the value of h used in the numerical calculation is large.
For example, in the multigrid method discussed in Chapter 6, some of the
calculations are done on a very coarse grid. Also, it is sometimes convenient
to get a rough solution to the dynamic programming equation with a large
value of h, and then use that as a starting solution for a finer grid. In such
cases, we need to be careful that the nonnegativity conditions hold.

5.3 The General Finite Difference Method


The method of finite differences for getting the transition probabilities and
interpolation time for a locally consistent chain for the general vector case
uses the same procedure as the scalar case examples in Sections 5.1 and
5.2. We will work with the controlled process x( ·) satisfying:
dx = b(x, u(x)) + a(x)dw, x E JW. {3.1)
Define the covariance matrix a(x) = a(x)a(x)' = {aij(x)}, i,j = 1, ... , r,
and recall the definition of the differential operator of (3.1) .ca
r a 1 r {)2
.C0 = Lbi(x,a)~ + 2 L aij(x)~. {3.2)
. 1
t=
x, ..
I,J=
1 x, x,

Let ei denote the unit vector in the i-th coordinate direction and let lR'h
denote the uniform h-grid on IRr; i.e., lR'h = {x: x = h L:i eimi : mi =
0, ±1, ±2, ... }. Until further notice, we use Sh = lR'h as the state space of
the approximating Markov chain. It is not necessary to use such a uniform
grid but it simplifies the explanation.
Recall the procedure used for the scalar cases in Sections 5.1 and 5.2.
Our aim in this and in the following sections is just to get an approximat-
ing Markov chain which is locally consistent with {3.1). The procedure of
Section 5.1 started with a partial differential equation of the type {3.3):
.cu(x)W(x, u) + k(x, u(x)) = 0, {3.3)
5.3 The General Finite Difference Method 107

where the function k(·) played only an auxiliary role. Then suitable finite
difference approximations for the derivatives at the point x were substi-
tuted into (3.3), terms collected, and all terms divided by the coefficient of
Wh(x,u). We use Wh(x,u) to denote the finite difference approximation
to (3.3). This procedure gave the transition probabilities and interpolation
interval automatically, as the coefficients in the resulting finite difference
equation. We follow the same procedure and show that it works, together
with all of the variants discussed in Sections 5.1 and 5.2.
The finite difference approximations used in this section are simple. E.g.,
only derivatives along the coordinate directions are taken, and the grid is
uniform. Other approximations might be useful in particular situations and
some further comments appear below. The method is a special case of the
approach in the next section.

The Diagonal Case. For ease of explanation, we first write the expressions
for the case where aij(x) = 0 fori -:f. j. Proceed as in Section 5.1, Example
4. As in that example, the boundary conditions and boundary cost play no
role in getting the chain, and we ignore them. For the second derivative,
we use the standard approximation

(3.4)

For the approximations to the first derivatives, we can adapt any of the
previously discussed schemes. Let us start with the use of one sided ap-
proximations analogous to (1.8), namely,

f () { [f(x + eih)- f(x)]/h if bi(x, a) ~ 0


(3.5)
x; x ~ [f(x)- f(x- eih)l/h if b.{x, a) < 0.

Note that the direction (forward or backward) of the finite difference ap-
proximation (3.5) is the direction of the velocity component bi(x,a). Let
X E Sh.
Use (3.4) and (3.5) in (3.3), denote the solution by Wh(x, u), collect
terms, clear the fraction by multiplying all terms by h2 , then divide all
terms by the coefficient of Wh(x,u) to get the finite difference equation

Wh(x, u) = LPh(x, yJu(x))Wh(y, u) + k(x, u(x))~th(x, u(x)), (3.6)


y

where
ph(x, x ± eihJa)

~th(x,a) (3.7)

Qh(x,a)
108 5. The Approximating Markov Chains

and suppose that ath(x, a) = O(h). For y E sh not of the form X± hei for
some i, set ph(x, yla) = 0.
The expressions (3.7) reduce to (1.22) when x(t) is real valued. The
constructed ph(x, yla) are nonnegative. For each x and a, they sum (over
y) to unity. Thus, they are transition probabilities for a controlled Markov
chain. The local consistency of (3.7) with x(·) is easy to check. In particular,

E!;~ae! = b(x,a)ath(x,a), (3.8)

Thus, for the case where the a(·) matrix is diagonal, the finite difference
method easily yields a locally consistent set of transition probabilities. All
the variations discussed in Section 5.2 can be used, and will be illustrated
in a two dimensional example in Subsection 5.3.2 below and at the end of
the next section.

Remark on (3.5): Other Transition Directions. The approximation


(3.5) uses the directions ±ei and the transitions due to the drift terms
are to the points of the type x ± ei. Other forms are possible, provided
only that local consistency holds. For example, in two dimensions, a drift
vector pointing in a direction between, say, e2 and e1 + e2, need only use a
randomization involving the points x + he2 and x + he 1 + he2. This general
idea is also included in the approach of Section 5.4.

5.3.1 The general case


Now consider the case where the off-diagonal terms ai;(x), i -:f. j, are not
all zero. For this case, the method illustrated so far is still useful, but it
breaks down if some of the aij (x) are large with respect to the diagonal
terms aii (x). This can be remedied by a more careful choice of the ap-
proximation, as discussed below or by the approach in the next section.
Analogously to what was done before for the finite difference approxima-
tion of the first derivative terms, we need to let the approximation used
for each off-diagonal term Wx;x 3 (x,u),i -:f. j, depend on the sign of its
coefficient ai;(x) to guarantee that the coefficients of the Wh(y, u) in the
finite difference equation will be nonnegative and sum to unity so that
they have the interpretation of a transition function for a Markov chain.
The procedure which follows might seem somewhat complicated because
the expressions for the finite difference approximations are complicated,
but similar methods are common in the numerical analysis of elliptic equa-
tions, as are the particular approximations (3.10) and (3.11) below. For
illustrative purposes, for the mixed derivatives, where i -:f. j, we use the
5.3 The General Finite Difference Method 109

following standard finite difference approximations: For aii(x) 2::: 0, use

fx;x; (x) --+ [2/(x) + f(x + eih + ejh) + f(x- eih- ejh)]/2h2
- [f(x + eih) + f(x- eih) + f(x + eih)
+ f(x- eih)]/2h 2 •
(3.10)
If aii(x) < 0, we will use

fx;x;(x) - [2/(x) + f(x + eih- eih) + f(x- eih + eih)]/2h 2


--+
+ [f(x + eih) + f(x- eih) + f(x + eih)
+ f(x- eih)]/2h2 •
(3.11)
Each of (3.10) and (3.11) are consistent and standard approximations
to the mixed second derivative. The reasons for these choices will soon be
clear. Assume that
aii(x)- L laii(x)l 2::: 0, (3.12)
j:#i

for all i,x. The condition (3.12) depends on the coordinate system which is
used, and several ways of relaxing it will be discussed later in this section
and in the next section. Define the normalizing coefficient

Qh(x,a)=Laii(x)- L laii(x)l/2+h:L':lbi(x,a)l, (3.13)


i,j:i,Pj

and the interpolation interval

(3.14)

Suppose that Llth(x, a) = O(h). Define the transition probabilities

ph(x,x ± eihla) = [aii(x)/2- L laii(x)l/2 + hbt(x,a)]/Qh(x,a),


j:#i
ph(x, x+ eih + eihla) = ph(x, x- eih- eihla) = a~(x)j2Qh(x, a),
ph(x, x- eih + eihla) = ph(x, x + eih- eihla) = aij(x)f2Qh(x, a).
(3.15)
For y not taking one of the listed values, define ph(x, yJa) = 0.
Equation (3.15) is obtained in the usual way: Substitute (3.4), (3.5),
(3.10) and (3.11) into (3.3), use the symmetry of the a(x) matrix, collect
terms, and divide by the coefficient of Wh(x, u) to get the finite difference
equation (3.6). The transition probabilities and interpolation interval are
the coefficients in that equation.
The chain defined by (3.15) with the interpolation interval (3.14) is lo-
cally consistent with the diffusion (3.1). The consistency is straightforward
110 5. The Approximating Markov Chains

to check. For example, letting ~~,i denote the i-th component of the vector
~~.we have fori i j,
Eh,o.t::,.ch,it::,.ch,j = 2h2 aij(x) = a··(x)t::.th(x a)
x,n '>n '>n 2Qh(x, a) lJ ' •

Similarly to get the "consistency relation"

E~;;:(t::.~~·i) 2 = aii(x)t::.th(x,a) + o(t::.th(x,a)),


note that the probability that the i-th coordinate changes by +h or -his

The different choices for the finite difference approximation to the mixed
second order derivatives are made in order to guarantee that the coefficients
of the off-diagonal terms ph(x, x+eih±ejhia), ph(x, x-eih±ejhia), i i j,
are nonnegative. Also these choices guarantee that the coefficients sum to
unity, so that they can be considered to be transition probabilities for an
approximating Markov chain.

On Condition {3.12). Sometimes the condition (3.12) is not satisfied.


We now mention some simple ways to circumvent this. Other more sophis-
ticated ways will be discussed in the next section. The various alternative
methods indicate the versatility of the general Markov chain approximation
scheme.
If the condition (3.12) fails because one of the aij(x), i i j, is large rel-
ative to the diagonal term aii(x), then we can redo the finite difference
calculations with the finite difference grid rotated so that the coordinate
lines are more closely aligned with the principal directions of the matrix
a(x); i.e., use a linear transformation of the coordinates. A simple form of
this uses the diagonal directions as the primary ones. Even curvilinear co-
ordinates can be used, if the principal directions vary substantially with x,
but the required programming might be too difficult. The condition (3.12)
can also be relaxed by letting the finite difference interval depend on the
coordinate direction. This involves somewhat more work in programming
the algorithm than for the constant interval case, but it is a widely used
procedure. There are many variations to this approach, and one need only
keep track of the local consistency. We now illustrate it for a particular
two dimensional case. For this case, a simple linear transformation of the
coordinates (which diagonalizes the matrix a(·)) might actually be easier
to use in the sense that once it is done the subsequent programming will
be simpler, provided that the transformation does not complicate dealing
with whatever boundary conditions there might be.
Suppose that au(x) = 1,a12(x) = a21(x) = 2,a22(x) = 5, so that (3.12)
fails. Let hi denote the finite difference interval in the i-th coordinate direc-
tion. Now use the finite difference procedures (3.4), (3.5), (3.10) and (3.11),
5.3 The General Finite Difference Method 111

but with hiei replacing the hei. Let c = h2/h1. Then one can show that
the resulting coefficients are the transition probabilities and interpolation
interval for a Markov chain which is locally consistent with x( ·) provided
that
au - a12/c > 0, a22 - ca12 > 0. {3.16)
As shown in (94, Section 5), any value of c satisfying 2 < c < 2.5 will yield
a locally consistent chain. In the next section, it will be shown how to relax
{3.12) further with the use of transitions to nonneighboring states.

Simplifying the 4th(x, a), ph(x, y!a). As noted in Section 5.2, it is


sometimes useful to eliminate the dependence of the interpolation intervals
on the control parameter a and the procedures discussed in Section 5.2 can
be used for this. Suppose that {3.12) holds and that

{3.17)

Define a new interpolation interval (not depending on the control parame-


ter)
h2
~th(x) = . {3.18)
[Li aii(x)- Li,j:i~i lai;(x)l/2)
Then the interpolation intervals of {3.14) and (3.18) differ by O{h3 ), and
with the new interval, we still have local consistency with {3.1).
Recall that local consistency is needed to guarantee that the optimal
value functions for the Markov chain approximations will converge to the
optimal value function for (3.1) as h ---t 0. Different locally consistent ap-
proximations might have rather different properties for "noninfinitesimal"
values of h. Thus, one must exercise care with the use of any simplifications
such as those discussed here.
To eliminate the control dependence from the denominator Qh(x, a) of
the transition functions, we can proceed as in the comments on Example
4 which were made in Section 5.2, and we briefly discuss the idea. Define
Qh(x) = maxaEU Qh(x, a). Define the new ph(x, y!a), to be denoted by
ph(x, y!a), as the old ones were defined in {3.15), but with Qh(x) replacing
the Qh (x, a) there, and define the new interpolation interval by

The values of the transition probability which were just defined might sum
(over y) to less than unity for some values of x and a. To compensate for
this, we allow each state x to communicate with itself with the probability

ph(x,xia) = 1- L ph(x,yia). (3.19)


y:y~x
112 5. The Approximating Markov Chains

The new transition probabilities and interpolation interval are also locally
consistent with (3.1).

5.3.2 A two dimensional example: Splitting the opemtors


For illustrative purposes, we repeat some of the calculations of Subsection
5.2.3 for the two dimensional example:

b1(x)dt,
(3.20)
b2(x)dt + u(x)dt + u(x)dw.

Writing the operator CC11 defined by (3.2) in the split up form suggested by
(2.9), we can write (3.3) as

Let the control space be U = [-1, 1]. Now use (3.4) for the second deriva-
tive Wz: 2 x 2 (x, u), and use the one sided approximation (3.5) for Wx 1 (x, u).
Approximate each of the Wx 2 (x,u) terms separately. For each of these
Wx 2 (x,u) terms, use either the one sided or the two sided (symmetric)
difference approximation, depending on the magnitudes of the coefficients.
In particular, let u 2 (x) ~ 1 for all x, and use the one sided approxima-
tion for Wx 2 (x)b2(x). Then the two sided approximation can be used for
Wx 2 (x,u)a and we get for h ~ 1

If a one sided difference approximation is used also to approximate the


5.4 A Direct Construction 113

term Wx 2 (x,u)a, then we get

2 -± ±
h( h[ ) _a (x)f2+hb 2 (x)+ha
p x,x ± e2 a - Qh(x,a) , (3.22)

h2
~th(x) = Qh(x,a)'
Qh(x, a)= a 2 (x) + h[b 1 (x)[ + h[b2(x)[ + h[a[.

5.4 A Direct Construction of the Approximating


Markov Chain
In this section, it will be shown how to construct a locally consistent chain
without having to go through the details of the finite difference approxi-
mation. The method illustrates the considerable freedom that we have in
selecting the appropriate approximating chains. We restrict the description
to the case of the model (3.1), but the changes required for other cases (e.g.,
the singular control problem dealt with in Chapter 8 or the jump diffusion
case dealt with in Section 5.6 below) should be obvious. The finite differ-
ence method has the advantage of being essentially automatic and requires
little effort. It does have some shortcomings. For example, (3.12) needs to
hold, and this can cause a problem if the covariance matrix a(x) is too
"skewed." But as pointed out in Section 5.3, (3.12) can be relaxed if we
use a different difference interval in each coordinate direction or rotate the
coordinates, or by a combination of these two methods. Other methods,
such as "nonlocal" difference approximations also help. The method to be
discussed in this section includes all of the above possibilities, and takes
advantage of the intuition which we gained in our discussion of the finite
difference method. Three examples will be given and then the general pro-
cedure will be described. We will try to utilize the information which we
have concerning the forms of the transition probabilities which is provided
by {1.22), (2.1), (2.15), (3.7), and (3.15).
Essentially, the method decomposes the increments of ~e~ into compo-
nents, each of which is due to a different part of the dynamical effects, then
it gets a locally consistent transition probability for each component, and
combines them appropriately. We first give three examples and then the
general method. The method is quite powerful. It provides a good and often
simple way to view the construction of the approximating Markov chain.
114 5. The Approximating Markov Chains

Indeed, one should not rely too heavily on direct finite difference type meth-
ods, unless clearly convenient, without understanding their relationship to
the approach of this section.

5.4.1 An introductory example


Example 1. We start with an examination of the transition probabilities
in (3.7). Write the transition probabilities in (3.7) as

P:,..::{a~~ = ±eih} = aii~x) x normalization


(4.1)
+ hbr(x, a) x normalization,
where the normalization guarantees that the sum of the transition proba-
bilities is unity for each x, a. In fact, the normalization is just the familiar
expression 1/Qh(x,a) given by (3.7). The "separated" form of the right
hand side in (4.1) implies that we can split the transition into disjoint
possibilities, a transition due to "noise" [which yields the term aii(x)] and
one due to "drift" [which yields the drift term bf (x, a)]. This separation
is the key to the direct method. One gets a locally consistent transition
probability for each part of the system separately; i.e., for the two systems
dx = cr(x)dw and dx = b(x, a)dt. Then the separate forms are combined
with the appropriate weights and normalization factor.

A Decomposition of the 'lransition Probability. Suppose that we


have two sets of transition probabilities and interpolation intervals

which are locally consistent with the systems defined by


dx = cr(x)dw, (4.2a)
dx = b(x, u)dt, (4.2b)
respectively. By the definition of local consistency,

L)Y- x)(y- x)'p~(x, yia) = a(x)Llt~(x, a)+ o(Llt~(x, a)),


y
(4.3)
~)y- x)p~(x,yia) = o(Llt~(x,a))
y

and
~)y- x)(y- x}'p~(x,yia) = o(Llt~(x,a}},
y
(4.4)
L)Y- x)p~(x, yia) = b(x, a)Llt~(x, a)+ o(Llt~(x,a)).
y
5.4 A Direct Construction 115

We next show how to combine (4.3) and (4.4) to get the desired transi-
tion probabilities and interpolation intervals ph(x, yia), ~th(x, a) in (3.7).
This is done via a "coin toss." Choose system (4.2a) [i.e., the transition
probabilities p~(x, yia) giving (4.3)] with probability ph(alx, a), and the
system (4.2b) [i.e., the transition probabilities giving (4.4)] with probabil-
ity ph(blx, a) = 1- ph(alx, a), where ph(alx, a) is to be determined. The
transition probabilities determined by the coin toss will be the ph(x, yia).
Dropping the o(·) terms, local consistency with (3.1) requires that

(4.5a)

y (4.5b)
= b(x, a)~th(x, a).
Equations (4.5) imply that

P h( I )_ ~t~(x, a)
ax, a - u.tb
A h(
x, a ) + uta
A h(
x, a ) .
This implies that

ph(x, yia) = [P~(x, yia)~tg(x, a)+ pg(x, yia)~t~(x, a)] x normalization,


(4.6)
A h( ) ~t~(x, a)~tg(x, a)
u.t x,a = h h . (4.7)
~ta(x,a) + L\tb(x,a)
Because the normalization equals 1/(.6-t:(x, a)+ .6-t~(x, a)),

In (3.22), the transition probabilities were written as certain ratios. Such


forms are common. Suppose that we can write

h n~(x, yia) h ng(x, yia)


Pa(x,yia)= Qh( ) , Pb(x,yia)= Qh( )·
a x,a b x,a
By suitably scaling n~(x, yia) it can be supposed that

L(Y- x)(y- x)'n~(x,yia) = h2 a(x)


y

which yields that


116 5. The Approximating Markov Chains

An analogous calculation yields


h h
Lltb (x, a) = Qh( )"
b x,a
DefineQh(x,a) =Q~(x,a)+hQ~(x,a). Then (4.6) and (4.7) can be writ-
ten as
h( I ) _ n~(x, yla) + hn~(x, yla) (4.8)
p x, y a - Qh(x, a)
and
(4.9)

To gain confidence in the formulas (4.6) to (4.9), let us see how (3.7) can
be recovered. Define
Q~(x,a) = I>ii(x),
i

and use the locally consistent [with (4.2a) and (4.2b), respectively] transi-
tion probabilities and interpolation intervals
h aii(x)
Pa(x,x±heila) = 2 Q~(x,a)'
(4.10)
h I bf(x,a)
pdx,x ± hei a)= Qh( ),
b x,a

Using (4.10) in (4.6) and (4.7) yields


ph(x, x ± heila)
aii(x)/2 h bf(x,a) h2 ]
- [ + x normalization
- Q~(x,a) QNx,a) Qi(x,a) Q~(x,a)
= [ aii~x) + bf(x, a)h] x normalization

and
h h2
Llt (x, a) = Q~(x, a)+ hQNx, a)'
which are precisely the forms given by (3. 7) or, equivalently, by (4.8) and
(4.9). The general case follows the same lines.
It is generally the case that the individual interpolation intervals have
the representations in (4.10), for appropriate Qf(x, a). These representa-
tions imply the form of the interpolation interval (4.9) (where the drift
component is weighed by h in calculating the normalization), and the form
of the transition probabilities in (4.8).
Two special cases which illustrate variations of this procedure will be
described in the next two subsections.
5.4 A Direct Construction 117

5.4.2 Example 2. A degenemte covariance matrix


Consider the two dimensional case, where the system is defined by (3.1),
but the covariance a(·) is totally degenerate and takes the value a(x) =
a = q2 vv', where v is a column vector and q a real number. That is, the
system satisfies

dx = b(x,u)dt + qvdw, v = (f3I.!h), f3i > 0, (4.11)


and w( ·) is real valued. Thus, the effect of the noise is to "push" the system
in the directions ±v. See the illustration of the terms in Figure 5.1a, where
we let f3t = 1.
(0,1

Figure 5.1a. The noise direction. Example 2.

The degenerate structure of the noise covariance matrix suggests that the
part of the transitions of any approximating Markov chain which approxi-
mates the effects of the "noise" would move the chain in the directions ±v.
We next pursue the decomposition approach of Example 1. Let the state
space Sh be that indicated by the extension of the grid in Figure 5.1b.
The "boxes" are such that the diagonals are in the directions ±v and of
magnitude vh. Given h > 0, the grid has spacing h in the e 1 direction and
f32h in the e2 direction.

Figure 5.1b. The state space. Example 2.


118 5. The Approximating Markov Chains

One set of transition probabilities for a locally consistent chain for the
component of (4.11} which is represented by

dx = qvdw (4.12}

is IJ!(x, x±hvla) = 1/2. With these transition probabilities, the covariance


of the state transition can be written as

Thus, P!(x, yla) together with the interpolation interval A.t!(x, a) = h2jq2,
is locally consistent with (4.12}. Hence Q!(x, a)= q2 and na(x, x±vhla) =
q2/2
One possibility for the transition probability for the approximation to
(4.2b} is
p~(x,x ± .Bieihla) = bf~,a) x normalization, (4.13)

where the normalization is 1j[lb1(x,a)/.81l + lb2(x,a)/.82ll = 1/Q~(x,a).


Hence nb(x, x±eihla) = br(x, a)/ .Bi· The .Bi in the denominator in (4.13) is
needed to get local consistency. The local consistency of (4.13) with (4.2b)
is shown by the calculations

~(y-x)p~(x,yla)
y
h[ :~~~: ~~] x normalization

h
b(x,a)lb1(x,a)/.81l + l~(x,a)/.821
b(x, a)At~(x, a),

~)y- x)(y- x)'p~(x,yla) = o(At~(x,a)).


y

Now, we combine the above "partial" transition probabilities. Following


the guide of Example 1, note that the interpolation intervals have the form
of (4.10) with Q~(x,a) = q2 • Thus, as in (4.8) and (4.9), we weigh the
contribution of the diffusion and drift component to get

where
Q h( ) _ 2 hlb1(x,a)l hlb2(x, a)l
x, a - q + .81 + .82 ,
5.4 A Direct Construction 119

h2
~th(x,a) = Qh(x,a)"

Local consistency with the process {4.1) of this example follows by an


argument of the type used in the discussion in Example 1 and can also be
readily checked directly.

Extensions. It is not necessary that we use transitions to x ± hf3iei to get


the approximations for the drift term. Depending on the mean directions,
one could use the "corner" points in lieu of some of the x ± hf3iei. The tran-
sition probabilities just derived are analogous to what one would get with
the finite difference method if a one sided difference approximation were
used for the first derivative terms. This is evident because we are approxi-
mating the noise and drift terms separately here, so each needs to yield a
"partial" transition probability. However, once the transition probabilities
ph(x, yla) are determined, we might be able to modify them to get are-
sult analogous to that for the finite difference method with the symmetric
difference used for the first derivatives. Simply replace the bf-(x,a)//3i by

We only need to check whether the resulting expressions for the transition
probabilities are all nonnegative.

5.4.3 Example 3
We next deal with a special two dimensional case with a nondegenerate
covariance. Refer to Figure 5.2a for an illustration of the terms and the
specification of the values of f3i·

f3t2 = .5

Figure 5.2a. The principle directions of the noise.

Let there be vectors (as in the figure) Vi of the form Vt = (f3n, f3t2) and
v2 = (- f32t, /322), where we set f3n = /322 = 1. Assume that there are
120 5. The Approximating Markov Chains

positive real numbers Qi such that a(x) =a= qrv 1 v~ + q~v2 v~. Thus, we
can write the process as
(4.14)
where the wi(·) are mutually independent one dimensional Wiener pro-
cesses.

We follow the development of the previous two examples by first decom-


posing the transition probability into component parts, some of which are
due to the noise effects and some to the drift, and then combining the parts.
Refer to Figure 5.2b for a description of the state space and transitions.
For this example, the part of the approximating chain which concerns the
noise term will take a state X E Sh into other states in the directions ±v1
or ±v2. In particular, the transitions concerned with the component of the
covariance which is represented by qrviv~ should move the chain in the
directions ±vi.

Figure 5.2b. Transition directions. Example 3.

The development of the transition probabilities for the subsystems in


Examples 1 and 2 can be nested. In the present case, we can deal with the
subsystems dx = q1 v1 dw1 and dx = q2v2dw2 separately, and then combine
the results. Using the method of Example 2 yields the following formulas:

ph(x,x±v1hla) = (qU2) x normalization constant,


ph(x, x ± v2hla) = (qV2) x normalization constant,
ph(x,x±eihla) = hbt(x,a) x normalization constant.
In order that the transition probabilities which were just defined add to
unity, the normalization constant needs to be
Qh(x, a)= qr + q~ + hlbl(x, a)l + hlb2(x, a)j.
5.4 A Direct Construction 121

If the interpolation time interval is defined by h2 jQh(x, a), then the chain
is locally consistent with the systems model of this example. Because the
transitions are not only to the nearest neighbors, more memory is required.

5.4.4 A general method


The decomposition scheme used in the above examples can be described
for a general problem as follows. For each point x, let there be a set of
vectors M(x) = {vi(x), i:::; m(x)}, where m(x) is uniformly bounded in x.
For X E sh, define the set yh(x) = {y : y = X+ hvi(x), i :::; m(x)}, and
suppose that Yh(x) c Sh· Local consistency with (3.1) implies that

L ph(x,x + hvi(x)la)hvi(x) = b(x,a)~th(x,a) + o(~th(x,a)),


iEM(x)
L ph(x,x + hvi(x)la)hvi(x)(hvHx))
iEM(x)

(4.15)
where ph(x, yla) are the transition probabilities and ~th(x, a) --+ 0, as
h--+ 0.
A convenient way of getting the transition probabilities follows the ap-
proach of the above examples by treating the transition probabilities for
the noise and drift components separately and then adding them up with
the appropriate weight and normalization factor. Suppose that qf(x) and
q?(x, a) are nonnegative (see the remarks on extensions below) and satisfy

b(x, a) = L q?(x, a)vi(x), (4.16a)


iEM(x)

a(x) = L qf(x)vi(x)v~(x), (4.16b)


iEM(x)

L qf(x)vi(x) = 0. (4.17)
iEM(x)

Define
Qh(x, a)= L [hq?(x, a)+ qf(x)]. (4.18)
iEM(x)

If the covariance a(x) depends on the control, then so would the qf{-).
Define the interpolation interval

(4.19)
122 5. The Approximating Markov Chains

and suppose that it goes to zero as h -+ 0. Then (4.19) and the transition
probabilities defined by

(4.20)

are locally consistent with (3.1).


This method contains Examples 2 and 3 of this section. It reduces to that
of Section 5.3, when the {vi(x)} is the set of vectors {±ei,ei ± ej, -ei ±
ej, i ::f: j, i,j}, and the values qJ(x, a)= bt(x, a) are used with the vectors
Vj(x) of the form ±ei and the qJ(x) are determined in an obvious way from
(3.15).

Comment. (4.17) is used to guarantee that the qf(x) do not contribute


to the mean direction. If q?(x, a) are set equal to zero, then (4.17) and
(4.16b) imply that (4.19) and (4.20) are locally consistent with (4.2a). The
comments made at the end of Example 2 concerning modifications of the
contributions of the "drift" components hold here also. We note also that
the q? (x, a) need not actually all be positive. It is required only that

(4.21)

Another Example: Correlated Noise. The following example illus-


trates a useful decomposition. The system is two dimensional and

dx = b(x)dt + dw,

but where the components of w( ·) are correlated. The process w( ·) can be


written in terms of three mutually independent Wiener processes as

w(t) = ( wl(t))
w 2 (t) + vw3(t).
With this decomposition, and the grid spacing depending on the direction,
we can easily get a locally consistent approximation.

5.5 Variable Grids


Up to this point, the major objective was to get a locally consistent ap-
proximating Markov chain. It is not always possible to get a chain which
is locally consistent at all points of Sh for all h. Loosely speaking, the cri-
terion for local consistency can be relaxed on a set of points if the limit
process spends negligible time on arbitrarily small neighborhoods of that
set and the possible values that the coefficients might take on that set do
5.5 Variable Grids 123

not affect the limit process. This point will be discussed further in Chapter
10. The probabilistic interpretation comes to our aid here also and allows
us to treat a relaxation of the consistency and continuity conditions which
might be difficult to achieve via the analytic methods of numerical analysis.
The problem occurs in practice in situations such as the following. Consider
a problem where the set G of interest for the control problem is divided
into disjoint subsets {G 1 , ... } , and we wish to have a different "level" of
approximation on each Gi. For example, a different spacing might be used
for the approximating grid on each subset. Suppose that the state space or
approximating grid is Sh and that we can find a suitable locally consistent
chain on the parts of Sh which are interior to each Gi. Due to the discon-
tinuities in the grid sizes, it might not be possible to get local consistency
at all points on the boundaries separating the Gi.
The point will be illustrated via a particular two dimensional example
with Sh shown in Figure 5.3a.

u~ 'U<J

y4 Yf lfl '/17 A
-v

y IU1
y2 u~

Gl

j ~~

Figure 5.3a. A variable grid


We continue to use the model (3.1). The set G is the entire two dimensional
space, and G 1 is the "southeast" subset with the fine grid. The boundary
is divided into two disjoint segments, denoted by A0 and A 1 • On G 1 , the
spacing interval is h/2; on the rest of G it is h. We need not use a square
grid, but it simplifies the discussion. The set Sh is the collection of all
points on the grids. Suppose that we can get a locally consistent approx-
imating chain at all points not on the boundary 8G 1 = A 0 U A 1 • Owing
to the regularity of the grids interior to both G 1 and G - G 1 , this could
be done by any of the methods of the previous sections, under appropriate
conditions on the covariance matrix. The same methods can be used to get
a locally consistent transition probability on the boundary points which
are in the coarse grid. But there might be problems at the other points on
the boundary because these do not connect to their neighbors in the same
124 5. The Approximating Markov Chains

"symmetric" way that the other points do. The main difficulties as well as
their resolution can be seen via a particular example.

A Two Dimensional Example. Consider the degenerate two dimen-


sional case given by {5.1), where the bi(x, a) are bounded and continuous:

b1(x,a)dt,
{5.1)
b2(x, a)dt + adw, a> 0.

The two boundary segments will be examined separately. It will first be


shown that a locally consistent transition probability can be constructed at
the point Yo on All and hence for all the points on A1. We will use the direct
method of Section 5.4, and deal with the drift and noise terms separately,
and then combine them with the appropriate weights and normalization.
In particular, the formulas {4.16) and (4.20) will be used.

The Calculation at y 0 • Define the vectors vi(Yo), i = 1, ... , 5, by

These are represented in Figure 5.3b. In particular,

vs(Yo) = -va(Yo) = e2j2,


v4(Yo) = e2/2- ell v2(Yo) = -e2/2- e1.
It is easy to find qJ(x,a) which yield {4.16a): Use

q~(yo, a) 2bt(Yo,a),
qg(yo, a) 2b2(Yo, a),
q?(Yo, a) 2bi(Yo,a),
qg(yo, a) b!(Yo,a)/2,
q~(yo, a) b!{yo,a)/2.

We can also write

a(yo) = ( ~ ~2) = 2a 2 [vs(Yo)v~(Yo) + va(Yo)v~(yo)].


Thus, we can use q~(y0 ) = qHyo) = 2a 2 , and then (4.19) and {4.20) are
locally consistent with {5.1) at Yo·
5.5 Variable Grids 125

Figure 5.3b. Transitions from Yo·

The Calculation at Y6· It will next be shown that it is not possible to


get a locally consistent transition probability at Y6· Nevertheless, one can
get close enough so that the limit processes and value function are the
desired ones. Following the procedure of Section 5.4, as done above, define
the vectors vi(Y6), i = 1, 5, 7, 8, 9, by

The possible transitions are shown in Figure 5.3c.

Yt

Figure 5.3c. Transitions from Y6·

It is possible to get a representation of the drift b(y6 , a) in terms of these


vi(Y6) such that (4.16a) holds. The problem arises with the representation
of the covariance matrix
(5.2)
of the type needed for (4.16b). The difficulty is due to the fact that none
of the vi(Y6) equal -vt(Y6)· The results of Theorem 10.5.3 imply that the
desired convergence will occur if we get a transition probability which is
126 5. The Approximating Markov Chains

locally consistent everywhere except on Ao, where local consistency does


not hold, and instead the chain satisfies

c2h 2I 2:: cov~:~~e~ 2:: cth 2I, ci > 0. (5.3)


Consider an "approximation" to (5.2) of the form

where qt(y6) and qJ(y6) are positive. In order not to introduce a very large
value of E~~~~e~, it is required that the mean change implied by (5.4) is
zero; i.e.,
(5.5)
This is a critical condition and implies that qt(y6) = 4qJ(y6)· Choose

Then
2
a= ~ [4vt(Y6)v~ (y6) + vs(Y6)v~(Y6) + vg(Y6)v~(Y6)], (5.6)

or
-- 2[1/60 0]1 .
a-a

Summary and Discussion. The transition probabilities and interpola-


tion interval just constructed are locally consistent with (5.1) at all points
except at the points on Ao which are in the fine grid but not on the coarse
grid. At these points there is local consistency, except for the added "noise
term" in the horizontal direction. Asymptotically, the behavior on A0 does
not affect the form of the limits for two reasons: First, the fraction of time
that the limit process will spend in a small neighborhood of A0 vanishes
as the size of that neighborhood goes to zero. Second, the behavior of the
approximating chain in a neighborhood of the boundary does not affect the
limit processes (as h -t 0), due to the fact that a f. 0 and

varh,a(~th)
x,n
> k 1 h2 '
<,n 2 - (5.7)

where k 1 > 0 and (~e~h is the component in the vertical direction. In fact,
the properties in the last sentence are the most important, provided that
the "drift" IE~·:: ~e~ I/ ~t~ is bounded in a neighborhood of the boundary.
Otherwise, the' form of the approximation used on Ao is not important.
5.6 Jump Diffusion Processes 127

5.6 Jump Diffusion Processes


5. 6.1 The jump diffusion process model: Recapitulation
The previous sections have been concerned with the problem of getting a
locally consistent Markov chain approximation for the diffusion model (3.1).
If such a locally consistent chain is available, then the extension to a locally
consistent chain for the jump diffusion of Section 1.5 is straightforward.
First, we review some properties of the jump diffusion process. Let us recall
the jump diffusion model (1.5.6):

dx = b(x, u)dt + a(x)dw + dJ. (6.1)


The jump term was represented in Section 1.5 in terms of a function q( ·) and
a Poisson measure N(·) of intensity >.dt x II(dp), where II(·) has compact
support r. The function q( ·) is assumed to be bounded and measurable
and is continuous in x for each value of p. In terms of the Poisson measure,
the jump term has the representation

J(t) =lot 1r q(x(s-),p)N(dsdp).


The use of the Poisson measure is just a convenient way of keeping track
of the jump times and values. We will next recapitulate some of the basic
facts and implications of Section 1.5 which will be useful for constructing
the approximating Markov chain and for the convergence theorems.
There is an intuitively simple, but equivalent, way to define the process
(6.1) and the jump term, by working with the jump times and values di-
rectly. To do this, let Vn, n 2: 1, denote the time of the n-th jump, and set
Vo = 0. Let {vn+l- Vn, Pn, n < oo} be mutually independent random vari-
ables with Vn+l- Vn being exponentially distributed with mean value 1/ >.,
and let the Pn have the common distribution II(·). In addition, for each n,
let {vi+l- vi, Pi, i 2: n} be independent of {x(s),s < Vn, vi+l- vi, pi, i <
n}. There are {Vn, Pn} satisfying these conditions such that the n-th jump
of the process x(·) is q(x(vn-),pn), and the jump term is

J(t) = L q(x(vn-),Pn)· (6.2)


lln:::;t

We can define the Poisson measure in terms of these quantities as follows,


where His a measurable subset of r: Define N(t, H) = N([O, t] x H) where

N(t,r) = max{n: Vn:::; t} = LI{vn9}•


n

(6.3)
n
128 5. The Approximating Markov Chains

The term N(t, H) is just the number of jumps with values in the setH on
the interval [0, t], and N(t, r) = number of jumps on the interval [0, t].

A Convention Concerning "Zero" Jumps. Let p be a random variable


with the distribution II(·). It is conceivable that q(x, p) = 0 with a positive
probability for some x. Equivalently, it is possible that q( x( Vn-), Pn) = 0
with a positive probability for some n. In this case, the actual physical
process x( ·) does not jump at Vn even though the "driving" process N (·)
does. For the sake of uniformity of expression, we still say that there is a
jump at that time, but with value zero.

Local Properties of the Jumps of (6.1). Because Vn+l - Vn is expo-


nentially distributed, we can write

P{x( ·)jumps on [t, t + ~)lx(s), w(s), N(s), s :::; t} =A~+ o(~). (6.4)

Define
fr(x, H)= II{p: q(x, p) E H}.
By the independence properties and the definition of Pn, for H c r we
have
P{x(t)- x(t-) E HI jump at t,x(t-) = x,w(s),x(s),N(s),s < t}
= II{p: q(x(t- ), p) E H} = fr(x(t- ), H).
(6.5)

5.6.2 Constructing the approximating Markov chain


It is implied by the above discussion that the jump diffusion x( ·) satisfying
(6.1) can be viewed as a process which evolves as a diffusion (3.1), with
jumps which occur at random according to the rate defined by (6.4). If we
wish to delete the jumps of magnitude "zero," then we can use the state
dependent jump rate

A(x) = AII{p: q(x, p) # 0}.


We will, in fact, always allow zero jumps and use the constant jump rate
A. Given that the n-th jump occurs at time Vn, we construct its values
according to the conditional probability law (6.5) or, equivalently, write it
as q(x(vn-), Pn), and then let the process continue as a diffusion (3.1) until
the time of the next jump. The locally consistent approximating Markov
chain can be constructed in an analogous way, and we now proceed to give
the details.

Definition of Local Consistency. The desired approximating Markov


chain{~~. n < oo} needs to mimic the local behavior of (6.1) in a way that
5.6 Jump Diffusion Processes 129

is analogous to what was required by (4.1.3) for the diffusion model. The
only difference between the cases of Section 4.1 [i.e., (3.1)] and our case
(6.1) is the presence of the jumps. Let Qh ( ·) be a bounded (uniformly in h)
measurable function such that

iqh(x, p) - q(x, p)i-t 0,

uniformly in x in any compact set for each p E r, and which satisfies x +


Qh(X, p) E Sh for x E Sh. The Qh(x, p) will be used to get the approximations
to the jump terms of (6.1). The condition that x + Qh(x, p) takes points in
Sh to points in Sh for each possible value p E r is needed because Sh is
the state space of the chain.
A controlled Markov chain {e~, n < oo} is said to be locally consistent
with (6.1) if there is an interpolation interval flth(x, o) which goes to zero
as h -t 0 uniformly in x and o, and such that:
(a) There is a transition probability p~(x, yin) which (together with an
interpolation interval flth (x, o)) is locally consistent with (3.1) in the sense
that (4.1.3) holds;
(b) There is <)h (x, o) = o( flth (x, o)) such that the one-step transition
probability ph(x, yin) for the chain can be represented in the factored form:

ph(x, yin) = (1- ALlth(x, a)- 8h(x, o))p~(x, yin)


+ (ALlth(x, o) + 8h(x, o))II{p: Qh(x, p) = y- x }.
(6.6)

An Interpretation of (6.6). The rule (6.6) is equivalent to the following


procedure. Suppose that the state has just changed to e~ = X and the con-
trol to u~ = o. The next interpolation interval either is flth(x, a) or has
conditional mean value flth(x, o). Now, determine the state e~H· With
probability (conditioned on the past data) (1- flth(x, o) + o(flth(x, o))),
e~+ 1 is determined by p~(·), the diffusion transition probability. With con-
ditional probability flth(x, o) + o(flth(x, o)), there is a "big" jump in that
e~+l = X+ Qh(x, p), where p has the distribution II(·) and is "indepen-
dent of the past." Let the discrete times at which these jumps occur be
denoted by v~, n = 1, .... Then, there is a sequence of random variables
{Pn, n < oo} where Pn has the distribution II(·) and is "independent of the
past " {t:h h . h h .
'>i,ui,t<VniPi,Vj,J<n,vn h} ,antd henextsateot . .
f t hech am1s
defined by
(6.7)

The Dynamic Programming Equation. The definition of local consis-


tency which was used above seems to be the simplest. Let G be a given set
and suppose that the cost function for the chain is the analogue of ( 4.2.2)
130 5. The Approximating Markov Chains

(the analogues of (4.2.3) and (4.3.6) are treated similarly)

where /3 > 0 and Nh is the first exit time from the set cg = Sh n G 0 . Then
the dynamic programming equation is, for X E GR,

Vh(x) = ~JD [e-f3<lth(x,a)(1 - .\~th(x, a) - oh(x, a)) I.>~(x, yia)Vh(y)


y

+ e-f3<lth(x,a)(.\~th(x, a)+ oh(x, a)) [ Vh(x + Qh(x, p))II(dp)


+ k(x,a)~th(x,a)],

with the boundary conditions Vh(x) = g(x), X~ cg.


This equation might seem formidable, but in applications it generally
takes a simple form. For example, consider the special two dimensional
case where (6.1) takes the form

dx = b(x, u)dt + a(x)dw + ( dj) ,

where the jumps in J(·) are either ±1, each with probability 1/2. Let 1/h
be an integer, and suppose that the points in sh are h units apart in the
e2 direction. Then the integral in the above expression is just

Of course, any suitable approximation to the exponentials in the equation


for Vh(x) can be used, and oh(x,a) can be set to zero if we wish.

The Interpolated Process 1/Jh(·). The continuous parameter Markov


process interpolation is defined exactly as in Section 4.3, and we continue
to use the terminology of that section. Recall the definition of the jump
times rt: given above (4.3.1). We need to distinguish the jump times of
1/Jh(·) which are due to the approximation of J(·) from the other jump
times. Define vt: = rhh.
vn
Then

(6.8)

See Figure 5.4 for a concrete illustration of the terms.


5.6 Jump Diffusion Processes 131

e~

et Qh(e~, pt)
e!l
e~
e3

I.II .II .11 _ ,,II


t

Figure 5.4. The continuous time interpolation.

5.6.3 A convenient representation of {€~, n < oo} and 'l/Jh(·)


It will be shown next that .,ph(·) can be represented in the form (4.3.9)
with a "jump term" added. This representation will be quite useful in the
proofs of the convergence theorems of Chapter 10. Let H~ denote the event
that e~+1 is determined by the diffusion transition probability pi(·), and
let lHh be its indicator function. Set lrh = 1- lHh· Let e8 = x, and let
E~ de~ote the expectation conditioned o~ the "data"up to time n;" i.e., on
V~ = {ef,u~,Hf, i ~ n;vj,pi: vj < n}. We can write
n-1 n-1
e~ = x+ EAefiHf + EAefir,h
i=O i=O
n-1 n-1
x+ LEfAefiHf' + :L!Aef -EfAeflJHf + L Qh(e:h,pi)·
.
=
i=O i=O i:vh <n '

Using the values of EfAef from Section 4.3, the last equation can be
written as
n-1
e~ = x+ I)b(ef,u~)At~ +o(At~)]JHf +M~ + J!, (6.9)
i=O

where M! and J~ are the middle and right hand sums, respectively, in
the previous equation. Due to the centering of its terms about the condi-

the filtration determined by v:


tional expectation, given the "past," M~ is a martingale with respect to
and has the quadratic variation
n-1
:~.)a(ef)At~ + o(At~)]JHf'. (6.10)
i=O
132 5. The Approximating Markov Chains

For each t,
E[number of n : v~ :::; t] -+ >.t
ash-+ 0. This implies that we can drop the IHh in (6.9) and (6.10) with
no effect in the limit. Using this reasoning, and the reasoning which led to
the representation (4.3.9), we can write

(6.11)

where Esups<t 15f(s)l -+ 0 ash-+ 0. In (6.11), uh(·) is the interpolation


of the controCsequence {u~, n < oo} and

Jh(t) = L qh('I/Jh(v~- ), Pn)· (6.12)


n:v::::;;t

Mh(-) is a martingale whose discontinuities go to zero ash-+ 0, and with


quadratic variation
lot a('I/Jh(s))ds + o~(t),
where Esup89 lc5q(s)l-+ 0 ash-+ 0.

5. 7 Approximations for Reflecting Boundaries


5. 7.1 General discussion
Suppose that the controlled process is of interest in a compact set G which is
the closure of its interior G0 • The approximations of the controlled processes
in Sections 5.1 to 5.6 have been concerned with getting Markov chains which
are locally consistent with (3.1) or {6.1) at all points x in a state space Sh,
and the behavior on a boundary was not considered. If the original control
process (3.1) or {6.1) is stopped as soon as the sample path leaves the
open set ao' then for the control problem for the approximating chain
one simply stops the chain the first time that it leaves ag = G0 n sh.
For part of the problems discussed in Section 1.4 and in Chapters 3 and
8, the boundary is reflecting or constraining. For these cases, we need to
reflect or project the approximating chain back into the set G whenever
it leaves G, in a way that is consistent with the set of allowed reflection
directions. Essentially all that we need when~~ = x is not in Gh = G n
Sh is that the conditional mean change Ef;;:: ~~~ approximate the desired
reflection direction. This can be readily done. It will be seen that, as with
the construction of the locally consistent chain itself, there are numerous
possibilities, and, generally speaking, any intuitively reasonable method
can be used.
5.7 Reflecting Boundaries 133

The Reflected Diffusion Model. We begin by considering the reflected


diffusion model discussed in Section 1.4, and comment later on the adjust-
ments needed for the reflected jump diffusion. Let us recapitulate the re-
flecting diffusion model and the associated conditions of Section 1.4. Thus,
for each x E 8G, we are given a set of direction vectors r(x) whose members
are of unit length. In many cases, the set contains only a single vector, and
then we abuse the terminology and refer to that singleton as r(x) also.
The model of the reflected diffusion given in Section 1.4 is that of the
Skorokhod Problem

x(t) = x +lot b(x(s), u(s))ds +lot u(x(s))dw(s) + z(t), (7.1)

where the "reflecting" term z( ·) is continuous and keeps the process x( ·)


from leaving G. It is required to satisfy

jzj(t) =lot Iaa(x(s))djzj(s), z(t) =lot -y(s)djzj(s), (7.2)

where -y(s) E r(x(s)) almost surely (w, s) with respect to the random mea-
sure induced by izi(·).
We next list the assumptions on r( ·) that will be used. Following the
statements of the assumptions is a discussion of their significance and ref-
erences to the literature.
The following conditions are assumed to hold. These conditions will be
used in the convergence theorems of Chapter 11.
(i) For each x, the positive cone generated by the vectors in r(x) is
convex.
(ii) The set G can be constructed as the intersection of a finite number
of "smooth" sets in the following way. There are a finite number of contin-
uously differentiable functions gi(·) and sets Gi = {x : Yi(x) :=::; 0} whose
boundaries are 8Gi = {x: Yi(x) = 0} and such that G = niGi. The sets G
and each Gi are assumed to be the closure of their interiors.
(iii) Suppose that for a given X E aa there is a single index i = i(x)
such that 9i(x) = 0, and let n(x) denote the interior normal to aai(x) at
x. Then the inner product of all the vectors in r(x) with n(x) is positive.
(iv) Define the index set l(x) = {i: x E 8Gi}· I(x) is upper semicontinu-
ous in the sense that if X E 8G, there is 6 > 0 such that jx- yj < 6 implies
that l(y) c I(x). Next, suppose that X E aa lies in the intersection of
more than one boundary; i.e, I(x) has the form I(x) = {i(1), ... , i(k)}
for some k > 1. Let N(x) denote the convex hull of the interior nor-
mals ni(l)' ... 'ni(k) to aGi(l)' ... 'aai(k) at X. Let there be some vector
v E N(x), such that -y'v > 0 for all 'Y E r(x).
In (7.1) and (7.2), r(x) needed to be defined only for x E 8G. For our pur-
poses of "approximation," the definition needs to be extended so that r(x)
is defined and satisfies an appropriate "upper semicontinuity" condition
134 5. The Approximating Markov Chains

in an appropriate "outer" neighborhood of the boundary. More precisely,


assume the following.
(v) Let N(G) ::J G be a neighborhood of G. There exists an extension of
r( ·) to N (G) - G0 which is upper semicontinuous in the following sense: Let
Xn ¢ G0 • If Xn--+ X E 8G and if "'n--+ "(with "fn E r(xn), then"( E r{x).
The set N (G) - G0 will contain the "discretized" reflecting boundary
act defined in the next subsection.
There are two forms of the boundary which are usually of interest. The
first is where the boundary surfaces are smooth, but curve. The second
is where G is a convex polyhedron and the reflection direction ri in the
relative interior of face 8Gi is unique. For the latter case, we can write z( ·)
as
z(t) =L riYi(t), {7.3)

where the Yi(·) are continuous, nondecreasing, satisfy Yi(O) = 0 and can
increase only at t where x(t) E 8Gi. The representation (7.3) is not neces-
sarily unique without further conditions, in the sense that z( ·) might not
determine Y(·) uniquely. If the covariance matrix a(·) is uniformly nonde-
generate, (70],(100, Chapter 4] then the contribution to z(·) is zero (with
probability one) during the times that x(t) E 8G. Then z(·) determines
Y ( ·) with probability one. This issue will be returned to in Chapter 11.

Remark. The theory of reflecting diffusion processes is currently an ac-


tive area of research. To meet the demands of various application areas,
a wide variety of assumptions on G and r(·) have been considered in the
literature. It seems likely that this trend wll continue, with even more un-
usual assumptions on the reflection term becoming commonplace. On the
other hand, the weakest assumptions that are required so that weak sense
uniqueness and existence hold are still far from clear. For these reasons
two criteria have been applied in formulating the assumptions used in this
book. The first criterion requires that the assumptions be flexible enough
to accommodate as many sets of assumptions that are of current interest
as possible. At the same time, the assumptions used must be strong enough
that the approximations to reflecting diffusions that occur in the numeri-
cal schemes can be shown to converge. The set of conditions given above
satisfy this second requirement, and yet are flexible enough that they simul-
taneously cover a number of quite different cases of interest. In particular,
they include those which arise in the "heavy traffic" models of Chapter
8, where the reflection directions are discontinuous and multivalued at the
"corners" . In most current applications, the set r{ x) contains only one di-
rection, except possibly at the edges and "corners" of 8G, where it is in
the convex hull of the directions on the adjoining faces.
In the remainder of this remark we will try to motivate each of the
assumptions and also cite the relevant literature regarding existence and
uniqueness results.
5. 7 Reflecting Boundaries 135

Conditions (i) and (v) are standard in the literature, although they may
be implied rather than explicitly stated. They are related to the fact that
reflecting diffusions often appear as weak limits. For example, in the "heavy
traffic" models of Chapter 8 the set G is usually the nonnegative orthant in
some Euclidean space, and r(x) is independent of x and single valued on the
relative interior of each "face" of G. The definition of r(x) is then extended
to all of {)G by the sort of "semicontinuity" assumed in (v) together with
the convexity assumed in (i). Assumption (ii) describes the regularity of the
boundary. It is formulated in such a way that is covers both the classical
case in which oG is a smooth manifold as well as the setting in which
G is a polytope. The formulation of conditions (i), (ii) and (v) follows
[41], which considers questions of strong existence and uniqueness. Related
works that also deal with strong existence and uniqueness of solutions are
[69, 115, 134, 147].
The most interesting conditions are (iii) and its extension in (iv). Ba-
sically, these assumptions guarantee that the reflection term must always
point "into" the domain G. They allow a key estimate on the reflection term
of the process in terms of the drift and diffusion components. This estimate
will be exploited in Chapter 11 to prove the weak convergence of processes
that is needed for the convergence proofs for the numerical schemes. This
type of assumption has been used previously in proving existence in [42].
For the special case when G is the r-dimensional nonnegative orthant in
JW and r(x) is independent of x and single valued on the relative interior of
each ''face" of oG, the condition (iv) is equivalent to the "completely-S"
condition used in [149, 148]. In these papers the condition has been used
to prove weak existence and uniqueness for the special case of a reflecting
Wiener process (i.e. b(·, ·) = 0 and cr(·) = cr).
As pointed out in Chapter 1, the term "constrained process" is often more
appropriate than the term "reflected process." In many of the applications
in which reflecting diffusion models arise, the reflection term is actually
a constraining term. For example, in the so-called "heavy traffic" models
discussed in Chapter 8, the reflection term is what keeps the buffers of cer-
tain queues from either becoming negative or overflowing. This is more of
a constraint than a reflection, in the physical sense. Reflecting boundaries
are often artificially introduced in order to get a bounded region in which
the numerical approximation can be carried out. Many control problems
are originally defined in an unbounded space, which is inconvenient for nu-
merical purposes. One tries to truncate the space in such a way that the
essential features of the control problem are retained. We can do this by
simply stopping the process when it first leaves some fixed and suitably
large set. One must then introduce some boundary condition or cost on the
stopping set. The objective is to choose a boundary cost that is believed to
be close to whatever the cost would be at these points for the untruncated
process. In particular, we do not want the chosen boundary and boundary
condition to seriously distort the optimal control at points not near that
136 5. The Approximating Markov Chains

boundary. An alternative is to introduce an appropriate reflecting bound-


ary.

5. 7.2 Locally consistent approximations on the boundary


Recall the definition Gh = GnSh. Assume that the transition probabilities
and interpolation interval ph(x, yia), flth(x, a), respectively, of a Markov
chain which is locally consistent with (3.1) have already been defined for all
points in Sh. Let the distance between communicating states of this chain
be bounded above and below by some positive constant times h. The aim
of the reflection or constraint is to keep the process in the set G, if it ever
attempts to leave it. We will use act to denote the "reflecting boundary"
for the approximating chain. The set act is always taken to be disjoint
from Gh. This is often a convenience in programming, although one can
redo the formulation so that the "reflecting boundary" is simply "near"
aG whether on the inside or outside. The transition probabilities at the
states in act are chosen so as to "mimic" the behavior of the reflection
for (7.1) and (7.2). The reflection direction is not controlled here, although
there are some comments on the algorithm when such boundary controls
are allowed at the end of Chapter 7. See [86, 109) for an example where
the reflection directions are controlled and [97) for the convergence of the
associated numerical algorithm. Thus, we may use ph(x, y) to denote the
transition function for points X E act. The reflecting boundary is defined
in a natural way as follows. act includes all points yin Sh- Gh for which
ph(x, yia) > 0 for some x E Gh and a E U. It also is "closed" in the sense
that it includes all points y such that ph(x, y) > 0 for X E act. We suppose
that the "radius" of act goes to zero as h ~ 0 in the sense that

lim sup dist (x, G)= 0.


h-tO xE8Gt

Let {~~, n < oo} be a Markov process with the transition probabilities
ph(x, y) on act and ph(x, yia) on Gh. We say that the transition function
ph(x,y) is locally consistent with the reflection directions r(·) if there are
fl > 0 and Ci > 0 such that for all X E aGt and all h,

E;;::(~~+l- ~~) E {0-y + o(h): c2h ~ (} ~ c1h,-y E r(x)}, (7.4a)

cov~;~(~~+l - ~~) = O(h 2 ), (7.4b)


ph(x,Gh) ~ fl, all hand X E act. (7.4c)
Condition (7.4c) is convenient, but it is sufficient to let it hold for the k-step
transition probabilities, where k is an integer not depending on x. Because
the chain is not controlled on act, the a in the E;;::
is redundant. Thus
5. 7 Reflecting Boundaries 137

the mean direction of the increment is an "admissible reflection direction"


plus a "small" error. If a Markov chain is locally consistent with {3.1)
in G and is also locally consistent with the reflection directions r(x), as
defined above, then we say that the chain is locally consistent with the
reflected diffusion (7.1), (7.2). See Figure 5.5 for an illustration, where G is
the polyhedral area, the circles are part of aat' and we suppose that the
"circles" communicate only with each other or with points in Gh, and that
points in Gh communicate only with points in Gh u 8Gt.

/
aG + h /
[:::7

J
v G

J
1/

""
""'
Figure 5.5. The boundary approximation.

The conditions {7.4) are needed for the convergence theorem in Chapter
11. For any fixed value of h, we would try to choose the ph (X' y) for X E aat
so that the behavior of the chain on aat copies as closely as possible the
behavior of the physical model on 8G. This is particularly helpful when
there is nonuniqueness of the directions. See the comments at the end of
Example 2 below.

The Interpolated Process ..ph (·). For x E 8Gt, we define the interpo-
lation interval tl.th(x) to be zero. The reason for this is the correspondence
of the set aat with the reflecting boundary 8G, and the fact that the role
of the reflection is to merely keep the process from leaving the desired state
space G. Indeed, the "instantaneous" character is inherent in the definition
of the reflected diffusion {7.1), (7.2).
Definer~ as in Section 4.3, but use the interpolation intervaltl.th(x) = 0
for x E 8G+. Thus, if e~ E 8Gt (i.e., n is the time of a "reflection"
step), then r~+l = r~. Define the continuous time parameter Markov chain
interpolation 'lj!h(·) by (4.3.1'). Note that the expression (4.3.1) cannot be
used to define 'lj!h(·), because (4.3.1) is multi-valued at those n which are
reflection steps. An alternative to (4.3.1') is: 'lj!h(r~) = e~(n)' where m(n) =
138 5. The Approximating Markov Chains

max{i : rih = r~}. Thus the values of the states at the moments of the
reflection steps do not appear in the definition of the interpolation. In this
sense these states are "instantaneous."
The reflecting states can actually be removed from the problem formu-
lation by using the multistep transition functions which eliminate them. It
is often useful to retain them to simplify the coding of the computational
algorithms for the solution of the dynamic programming equations of, e.g.,
Section 5.8 below, as well as to facilitate the convergence proofs.

5. 7. 3 The continuous parameter Markov chain interpolation


For use in Chapter 11, we now give a representation of the continuous
parameter interpolation which is analogous to (6.11) or (4.3.9). Let n be a
reflection step and define Llz~ by Llz~ = E~~e~ = Llzh(e~), where Llzh(·)
satisfies (7.4). Define Llz~ by ae~ = Llz~ + Llz~ and set Llz~ = Llz~ = 0 if
n is not a reflection step. Define the continuous parameter interpolations

Then (6.11) (or (4.3.9) if there is no jump term) can be extended as

'1/Jh(t) = x+ 1t b('I/Jh(s), uh(s))ds + Mh(t) + Jh(t) + zh(t) + zh(t) + 8~(t),


(7.5)
where all the terms are as in (6.11), except for the just defined reflection
terms.
It turns out that sups<t lzh(s)l --t 0 in mean square for each t. This
implies that in the limit -only the mean reflection directions count. The
effects of the perturbations about the mean go to zero. As a first step in
the proof note that, because of the martingale properties,
m-1 2
Esup :Lazf O(h2 )E(number of reflection steps on[O, n))
m~n i=O

< O(h)E IE~,:-01 Llzfl·


(7.6)
The proof will be completed in Chapter 11.

5. 1.4 Examples
Example 1. Getting a transition function which is locally consistent with
the boundary reflection is often quite straightforward, as will be illustrated
by three simple examples. First consider the case where G is the rectangular
area in Figure 5.6. The reflection direction at the point x, denoted by r(x),
5. 7 Reflecting Boundaries 139

is the vector of unit length which has direction (2, 1). Let the state space
Sh be the regular h-grid. The simplest choice for r/'(x, y) is obviously
ph(x, x + (h, 0)) = ph(x, x + (h, h)) = 1/2. Here, we have obtained the
proper average direction by randomizing among two "allowed" directions.
The effects of the perturbations a.z~ about the mean value will disappear
ash -t 0.

Figure 5.6. Directions and transitions for Example 1.

Example 2. We next consider a problem for the set G in Figure 5.6,


but now the reflection direction is the unit vector in the direction (2, 3).
See Figures 5.7a,b. Here, several possibilities are of interest. Clearly, this
direction cannot be achieved as a convex combination of the vectors {1, 0)
and {1, 1). There are several alternative constructions. For example, {2, 3) is
in the convex cone generated by the vectors (1, 1) and {1, 2), and in fact we
can take ph(x,x + (h,h)) = ph(x,x+ (h,2h)) = 1/2, as in Figure 5.7a. An
alternative is to exploit the possibility of transitions between states in act.
For example, we can take r/'{x, x+(h, h)) = 2/3 and ph(x, x+(O, h)) = 1/3,
as in Figure 5.7b.

Figure 5.7a. Example 2, choice 1.


140 5. The Approximating Markov Chains

Figure 5.7b. Example 2, choice 2.

A slight variation of this problem appears in Figure 5.8, where r1 is the


reflection direction on {x: x1 = O,x2 > 0} and r2 is the reflection direction
on {x : x 2 = O,x 1 > 0}. The set r(O) is composed of the convex cone
generated by these directions. Thus, there are several possibilities for the
reflection directions at the points ( -h, 0), (-h, -h), (0, -h) E 8Gt. The
choices indicated in the figure are reasonable for this problem. The physical
origin of the problem often suggests appropriate directions in such cases
of nonuniqueness. This example is typical of those arising as "heavy traffic
limits," as in Chapter 8. For that case, the reflections serve the purposes
of maintaining the nonnegativity of the contents of the buffers and also
keeping them from exceeding their capacities.

Figure 5.8. Example 3 with a corner.

Example 3. Next consider the example illustrated in Figure 5.9, where


the boundary is "curved" and the directions depend on x.
5.8 Dynamic Programming Equations 141

Figure 5.9. Example 3.


Suppose that r( ·) is single valued, except at the corners. Consider the
point X E act. Going in a direction r(x) from x, we meet the line con-
necting the two points Y1 and Y2 in Sh at the point Yo ¢ Sh. To meet the
condition {7.4), define the transition function

h( ) Y21 - Y01 h( ) Yo1 - Yn


p x,yl = h ' p x,y2 = h '

where Yii is the j-th component of the vector Yi· It is straightforward to


define actsuch that the "randomizing" procedure just described can be
carried out for all points in act.
5. 7. 5 The reflected jump diffusion
This is treated in the same way as was the reflected diffusion. We need only
decide what to do if the process x( ·) leaves the set C because of a jump.
For the reflecting boundary problem, we suppose that x + q(x, p) E C and
that x + Qh(X, p) E C for x E C and p E A.

5.8 Dynamic Programming Equations


5. 8.1 Optimal stopping
In this section, we will rewrite the dynamic programming equations for
the problems in Chapter 2 in the notation of the controlled Markov chain
approximations of this chapter. These will be needed in the discussion of
numerical methods in Chapter 6. The functions k(·) and g(·) are assumed to
be bounded. We first treat the optimal stopping problem. The underlying
model is (3.1.1) or {3.1.2) or, equivalently, {3.1) or {6.1) but with the control
142 5. The Approximating Markov Chains

u( ·) dropped. For a stopping time T and a discount factor {3 > 0, let the
cost for x(·) be the discounted form of (3.2.1}:

Here, the only control is the choice of the stopping time.


For the Markov chain approximation, an appropriate cost when stopping
at N is

(8.2}

where we recall the notation t~ = 2:~,:01 ~tf and ~t~ = ~th(e~). From
Section 2.2, the dynamic programming equation for the infima of the costs
is

Due to the discounting, all the costs and equations are well defined. Any
acceptable approximation to e-f3tl.th(z) can be used; for example, if h is
small, then one can use either of
1
1 + {3~th(x) ·

Generally, for numerical purposes the process will be confined to some


compact set G. Suppose that we will be required to stop on first exit from
G, with a stopping cost g(x), if we have not decided to stop before that
time. To formulate this case, define

r' = inf{t: x(t) fl G}. {8.4}

Thus we require that the stopping times for x( ·) be no larger than r'. In
this case, {8.3) holds for x E Gh = G n Sh; otherwise,

(8.5)

Undiscounted Optimal Stopping. Continue with the setup of the last


paragraph, but set {3 = 0, and refer to Section 2.2.2. Neither the cost for
an arbitrary stopping time nor the dynamic programming equations are
necessarily well defined, and special conditions are needed. Suppose that
the stopping times must satisfy the condition below (8.4} and similarly for
the approximating chains. Then, formally, (8.3) can be written as
5.8 Dynamic Programming Equations 143

V(x) = { min [~P"(x, y)V•(y) + k(x)<~.t•(x),g(x)],


g(x), X ct Gh.
{8.6)
The dynamic programming equation is well defined if the analogue of the
conditions given in Section 2.2.2 hold. In particular, if (a) there is k0 > 0,
such that k(x) ~ ko all x, or (b) the mean time to the obligatory stopping
set is bounded for each initial condition x. A case where neither condition
holds is dealt with in the "shape from shading" example in Chapter 15.

Reflecting Boundary. Next, suppose that the process x( ·) is a "reflect-


ing" jump diffusion of the type dealt with in Section 5. 7, where the "in-
stantaneous" reflection constrains x( ·) to stay in a given compact set G.
The cost is still given by {8.1). Suppose that there is no obligatory stop-
ping set and that /3 > 0. Let aat denote the "reflecting boundary" for the
approximating chain, as in Section 5.7. Then the dynamic programming
equation is (8.3) for x E Gh, and otherwise

Vh(x) = I>h(x,y)Vh(y), X E aat. (8.7)


y

Because the reflection is "instantaneous," there is no discount factor in


(8.7). It is possible to eliminate the reflecting states from the equation
(8.3), as discussed below (8.23), but from the computational point of view
it is generally more convenient to keep them. If f3 = 0, then the dynamic
programming equation is well defined under condition (a) in the above
paragraph. In Chapter 8, there is a brief discussion of an example where a
cost is associated with the reflection.

The Interpolated Process ,ph (·). Recall the continuous parameter Mar-
kov chain interpolation 'f/!h(-) from Section 4.3 or Subsection 5.7.3. For a
stopping time r for 'f/!h (·), an appropriate analogue of the cost function
(8.1) is

By the results in Section 4.3, (8.8) is also the cost for the discrete parameter
chain (with r = r~ being the interpolated time which corresponds toN) if
the discount factor e-/JAth(x) is approximated by 1/[1+/3~th(x)]. Otherwise
it is an approximation, with an error which goes to zero as h---+ 0.
144 5. The Approximating Markov Chains

5.8.2 Control until exit from a compact set


We next treat a discounted cost problem with discount factor {3 ~ 0, where
the control stops on first exit T from the interior G0 of the compact set
G. Also, suppose that there is no reflecting boundary. Suppose that the
functions k(x, ·) and rP(x, yi·) are continuous in a: E U for each x, y. Let
u = {u~, n < oo} be an admissible control sequence and define Nh to be
the first exit time of the chain from the set G~ = Sh n G0 • Also, define 8G h
to be the "boundary" set of states which are reachable from states in ao
in one step under some control action. An appropriate cost for the chain
is:

w•(x,u) ~ E; [1' .-••:k(e~,u~)Ll.t~ +e-••t,g(e~.)J. (8.9)

Let u~ = u(e~) for a feedback control u(·). If the sum is well defined and
bounded for each x under u(.·), then Wh(x, u) satisfies the equation

Wh(x,u) =
L e-f3llth(x,u(x))ph(x, yiu(x))Wh(y, u) + k(x, u(x))ath(x, u(x))
y
(8.10)
for x E G~, and with the boundary condition

Wh(x,u) = g(x), for x E 8Gh. (8.11)


The dynamic programming equation for the optimal value function is

for x E G~ with the boundary condition (8.11). If the interpolation intervals


are small, then it might be convenient to replace the exponentials in (8.10)
and (8.12) by 1- {3ath(x, a:) or by 1/[1 + {3ath(x, a:)].
An appropriate cost function for the continuous parameter interpolation
1/Jh(·) is

Wh(x,uh) = E;h [1rh e-f3 8 k('I/Jh(s),uh(s))ds+e-f3rhg('I/Jh(rh))], (8.13)

where Th is the first escape time of 1/Jh(·) from G0 , and uh(·) is the con-
tinuous parameter interpolation of {u~, n < oo}. The remarks which were
made in the previous subsection concerning the equivalence of this cost
with that for the discrete parameter chain hold here also.

A Compact Vector Form of the Equations. For use in Chapter 6, it is


convenient to write the above equations in a more compact matrix-vector
5.8 Dynamic Programming Equations 145

form analogous to (2.2.5), (2.3.3), or (2.4.4). Let u( ·) be a feedback control.


Define the vectors Wh(u) = {Wh(x,u),x E CR} and yh = {Vh(x), x E
CR}. Define the cost vector Ch(u) = {Ch(x, u), x E CR} with components

(8.14)
for X E cg. Define the matrix Rh(u) = {rh(x,yiu(x));x,y E cg} where
(8.15)

for x E cg. Then we can write the equation for the cost (8.10) and the
dynamic programming equation (8.12) as

Wh(u) = Rh(u)Wh(u) + Ch(u), (8.16)


yh = min [Rh(u)Vh + Ch(u)]. (8.17)
u(x)EU

The minimum in the vector valued expression (8.17) is understood to be


taken component by component.

5. 8. 3 Reflecting boundary
Now suppose that the boundary is "instantaneously" reflecting as in Section
5.7, and the transition probabilities for the reflecting states do not depend
on the control. The approximating chain is assumed to be locally consistent
with. the diffusion (3.1) or the jump diffusion (6.1) in Ch = Sh n C, and
with the reflection directions r(x) on act, where the reflecting boundary
is disjoint from Ch. Recall that the interpolation intervals fl.th(x) equal
zero for points in aCt. An obligatory stopping set and associated stopping
cost can be added and will be commented on briefly below.
Suppose that the cost for (7.1) takes either the form (8.18a) or (8.18b):

E; loco e-f3t [k(x(t), u(t))dt + c'(x(t))dz(t)], (8.18a)

where c(·) is bounded and continuous and c'(x)'y 2: 0, for any"( E r(x), x E
aC. When Cis a convex polyhedron and (7.3) holds, then we use

E; leo e-f3t [k(x(t), u(t))dt + c'dY(t)], (8.18b)

where Ci 2: 0.
Let u = { u~, n < oo} be an admissible control sequence. Write the "con-
ditional mean" increment when the state e~ = X is in aCt as fl.zh(x) =
E~;::Lle~, and (for the case of polyhedral C and the representation (7.3))
146 5. The Approximating Markov Chains

analogously define ~Yh(x) = E!:~~Y,::.. Then, for this unstopped reflec-


tion problem, appropriate analogues for the chain of the cost functions
(8.18) are

w•(x, u) = E': {~ .-•·: [k(e~, ·~)t.t~ + c'(e::)t.z•(e~)[} . (8.19a)

(8.19b)

Let u~ = u(e~) for a feedback control u(·). Then, if the sum in (8.19)
is well defined and bounded, Wh(x, u) satisfies (8.10) for x E Gh. For
X E oGt and (8.19a) we have

Wh(x, u) = :~::>h(x, y)Wh(y,


y
u) + c'(x)~zh(x). (8.20)

The equation for the optimal value function is (8.12) for x E Gh, and for
x E aat it is

Vh(x) = I>h(x, y)Vh(y) + c'(x)~zh(x), (8.21)


y

with the analogous forms for the case of (8.19b)


The "reflection" properties of the approximating Markov chain models
the "instantaneous" reflection of the underlying diffusion or jump diffusion
model. The interpolation time interval for the reflection steps equals zero,
consistent with the instantaneous character of the reflection for (7.1), (7.2).
Hence, there is no discounting associated with that step.
Suppose that x( ·) is a reflecting diffusion process, but allow the interpo-
lation interval for the reflecting states to be nonzero and of the order of the
step size. Then the limit processes might be "sticky" on the boundaries;
i.e., spend positive time on the boundaries. An approximation for such a
case was discussed in [89], but we omit it here because it seems to occur
rarely in applications at this time.

A Vector Form of the Equations for the Cost. The formulas (8.16)
and (8.17) hold with appropriate redefinitions. Redefine the vector Wh(u) =
{Wh(x, u), X E Gh u aat} and similarly redefine the vector yh. For feed-
back u(·), redefine Rh(u) to have the components {rh(x,yiu(x)), x,y E
Gh u aat}, where (8.15) holds for X E Gh, and for X E aat use

(8.22)
5.8 Dynamic Programming Equations 147

Finally, redefine the vector Ch(u) with components {Ch(x,u),x E Ch U


act} where
{8.23)

with either Ch(x,u) = c'(x)~zh(x) or Ch(x,u) = c'~Yh(x) for X E act.


Then, (8.16) holds for the given control u(·) if the cost Wh(x,u) is well
defined and bounded and {8.17) holds if the minimum cost is well defined
and bounded.

Eliminating the Reflection States. The states in the reflection set act
can be eliminated from the state space if desired, because their transition
probabilities do not depend on the control and their interpolation intervals
are zero. But, from the point of view of programming convenience (the au-
tomatic and simple generation of the transition probabilities and intervals)
it seems to be simpler to keep these states. We will illustrate the method of
eliminating these states for one particular case, and this should make the
general idea clear. Refer to Figure 5.10.

Figure 5.10. Eliminating a reflection state.

State x2 is a reflection state. It is assumed to communicate only with


states xo, xh and it is reached only from states x0 , x3. To eliminate state x2,
we redefine the transition probability from xo to x 1 to be ph(xo,xlla) +
ph(xo, x2ja)ph(x2, xi). Analogously, define ph(x3, xola) and ph(x3, x1ja).
Suppose that states in act communicate only to states in Ch. Then, in
general, for x, y f/. act, use the redefinitions

ph(x,yJa) -t ph(x,yJa) + L ph(x,zla)ph(z,y).


zE8G~
148 5. The Approximating Markov Chains

Both Reflecting and Stopping Sets. The dynamic programming equa-


tions can easily be derived for the case where part of the boundary is
reflecting, but where there is also an obligatory stopping set with an asso-
ciated stopping cost g(·). Write aat for the discretization of the reflecting
boundary, as before. Divide the set Gh into disjoint subsets, G~ and 8G~,
where the latter represents a discretization of the stopping set. Suppose
that the reflecting boundary does not communicate with a~, the absorb-
ing boundary. Redefine the vectors Wh(u) = {Wh(x,u), x E G~ U 8Gt}
and Vh(u) = {Vh(x,u), X E G~ u aat}. The components of the vector
Ch(u) and matrix Rh(u) are defined by (8.14) and (8.15), respectively, for
X E G~, and by (8.22) and below (8.23) for X E aat. Then, if the costs are
well defined and finite, (8.16) and (8.17) continue to hold.

5.9 Controlled and State Dependent Variance


If the variance is controlled or is highly state dependent, more care needs to
be exercised in the construction of the algorithms. It might be impossible
to get locally consistent approximating chains with only local transitions.
Often one needs to strike a balance between complexity of coding and a
small level of "numerical noise." These issues will be illustrated in this sec-
tion. Suppose that in the approximations used in Section 5.3, the covariance
matrix a(·) depended on both x and a and that (3.12) holds as some point
(x,a). Then the approximating process of Section 5.3 can be used. If {3.12)
does not hold, and if the suggestions for relaxing the condition in Section
5.3 cannot be used, then one needs to use "non local" transitions. The
issues will be illustrated for the degenerate two dimensional model:
dx = a(x,a)dw, x(t) E JR?, (9.1)
where w( ·) is a real-valued Wiener process and a( x, a) is the vector with
transpose (a1(x, a), a2(x, a)), where O'i(x, a) ~ 0. The state space for the
approximating chain is a regular grid with spacing h in each direction. The
general problem will be illustrated by two examples. The variables k, with
or without affixes, are nonnegative integers, which might depend on (x, a).

Example 1. Fix (x, a). Suppose that the scaling and the ordering of
the components of the state are such that a(x, a) = {1, k + -y(x, a)) for
0 < -y(x,a) < 1. First, let us approximate the one step transition by ran-
domizing between the values
x --+ x ± [e 1 + e2k]h, with probability p!/2, each
x--+ x ± [e1 + e2(k + 1)]h, with probability P2/2, each,
where Pl + P2 = 1. Then
E~::: [e~+l -x] [e~+l -x]' = h 2 C(x,a), Ath(x,a) = h2 , {9.2)
5.9 Controlled and State Dependent Variance 149

where

C(x, a:)

We have C11 {x,a:) = al(x,a) = 1, and would like match the remaining
elements c12(x, a:) and c22{x, a:) with the analogous elements of a(x, a:).
But there is only one parameter that can be selected, namely P2· First
choose P2 so that C12(x,a:) = al(x,a:)a2(x,a:) = a12(x,a:). Then P2
'Y(x, a), and the relative numerical noise for the {2,2) component is
C22(x, a:) - a~(X, a:) _ ')'(X, a:){1 - 'Y(X, a:)) _ O (__!__). {9.3)
a~(x,a) - (k+')'{x,a:))2 - k2 ·
Alternatively, choosing P2 such that C22 {x, a:) = a~(x, a:) yields
_ ')'(X, a:){2k +')'(X, a:)} < ( )
P2- 2k+1 _')'x,a,
and the relative numerical noise for the (1,2) component is then
a1(X, a)a2(X, a:)- C12(x, a:) = ')'(X, a:}(1- ')'(X, a:)) = O (-1-) •
a1 (x, a)a2(x, a) (2k + 1)(k + 'Y(x, a:)) 2k2
Thus, although both procedures seem reasonable, we see that neither gives
local consistency if 'Y(x, a) is neither 0 nor 1, although the relative numer-
ical noise decreases rapidly as k increases. If we simply used P2 = 0 or 1,
then the relative numerical noise would be 0(1/k), which illustrates the
advantages of randomization.
The solution to the optimal control problem is often relatively insensitive
to small numerical noise, even up to 5-10% (which has an effect similar to
that due to adding noise to the dynamics). One must experiment.

Example 2. Fix (x, a:). Now, extending the above example, suppose that
the scaling is such that a(x,a:) = (kt.k2 + 'Y(x,a:)), where k2 2: k1. Con-
struct an approximating chain by using the transitions
x -t x ± [e1k1 + e2k2]h, with probability pt/2, each,
x -t x ± [e1k1 + e2(k2 + l)]h, with probability P2/2, each.
Defining the matrix C(x, a:) analogously to what was done above, we have
al(x,a) = Cu(x,a:) for any P2· Choosing P2 so that C12(x,a:) is equal to
a1 (x, a)a2(x, a:) yields P2 = 'Y(x, a) and the relative numerical noise for the
{2,2) component is
C22(x, ~)- a~(x, a:) = ')'(X, 0:)~1- 'Y(X, a:)) = O (~).
a 2(x, a:) a 2(x, a:) k2
150 5. The Approximating Markov Chains

If we simply set P2 = 0, then the relative numerical noise for the (2, 2)
component is

which again shows the advantage of randomization.


A general rule. The above examples suggest a useful general rule. This
rule is asymptotic, as h --? 0. But it can serve as a guide to the selection of h,
and guarantees the convergence Vh(x)--? V(x). Typically, one uses several
values of h. In any practical algorithm, one might have to accept a small
amount of (relative) numerical noise. As the calculations above show, the
numerical noise can only be reduced by using larger values of the ki, which
means that increasingly "nonlocal" are used. Thus the problem should be
scaled so that the number of grid points (the ki above) moved per step can
increase if needed as h decreases, so as to guarantee local consistency.
Fix (x, a), and suppose that a2(x, a) ;:::: a1(x, a). If the reverse inequality
holds, then use the obvious analogue of the procedure. First, if there are
integers kNx,a),k~(x,a) such that

a2(x, a)ja1(x, a)= k~(x, a)/k~(x, a), (9.4)


then use the transition

x--? x ± [e1k~(x, a)+ e2k~(x, a)]h, each with probability 1/2. (9.5)

l
Then the conditional covariance matrix (9.2) is

h 2C( x, a ) = h 2 [ (kf(x,a)) 2 kf(x,a)kq(x,a) =A h( ) ( )


k1h( x,a )k2h( x,a ) (kh( x,a )) 2
2
ut x, a a x, a ,
(9.6)
where the right hand equality is just the assertion of local consistency and
defines .dth(x, a). Using (9.4), the center term of (9.6) can be written as

(9.7)

which implies that

(9.8)

If there are no integers such that (9.4) holds, then choose kh --? oo
but such that khh --? 0. Then choose kNx, a), k~(x, a) such that they go
to infinity as h --? 0, are no greater than kh, and satisfy (which defines
/'~(x,a))

kq(x, a)+ 'Yq(x, a)


for 1 ;:::: 'Y~(x, a) ~ 0, (9.9)
kf(x,a)
5.9 Controlled and State Dependent Variance 151

and use the either of the following procedures.


First, consider the simplest procedure by letting the allowed transitions
be (9.5). Then the conditional covariance matrix in (9.2) is the central term
in (9.6), but the right hand equality in (9.6) no longer holds. The interval
A.th(x, a) can be selected such that equality will hold in one component of
(9.6). For example, if we set (compare with (9.8))

A h( ) _ h2(kq(x, a) + -yq(x, a)) 2


L.).t x,a - 2( ) , (9.10)
u 2 x,a

then there is local consistency in the (1, 1) component. The relative numer-
ical noise for the (2,2) component is then

c22(x, a)- ui(x, a) - 2k~(x, a)'Y~(x, a)+ 'Y~(x, a) - 0 ( 1 )


u~(x, a) - (k~(x, a)+ -y~(x, a))2 - k~(x, a) ·
(9.11)
Alternatively to the use of the transitions in (9.5), we can randomize,
using the allowed transitions

x---+ x ± [etkNx, a)+ e2k~(x, a)]h, each with probability ptf2,


x -;t x ± [etk~(x,a) + e2(k~(x,a) + 1)]h, each with probability P2/2.
(9.12)
Setting P2 = -y~(x, a), where -y~(x, a) is still defined by (9.9), yields

C11(x,a) (kNx,a)) 2
C12(x,a) = kf(x, a)(k~(x, a)+ -y~(x, a))
(9.13)
C21(x,a) kf(x, a)(k~(x, a)+ -y~(x, a))
c22(x,a) (kq(x, a)+ -y~(x, a)) 2 + -y~(x, a)(1 --y~(x, a))
Let A.th(x, a) be given by (9.10). Then there is local consistency in the (1, 1)
and (1, 2) components. The relative numerical noise in the (2,2) component
is
-y~(x, a)(1 --y~(x, a)) = 0 ( 1 ) (9.14)
(k~(x, a)+ -y~(x, a)) 2 (k~(x, a))2 ·
Thus, randomization is preferable, but it involves a more complex code.
Of course, in any application, one uses only a few fixed small values of h.
But the above comments serve as a useful guide. To apply the procedure in
the various numerical algorithms of the next chapter, the set U will usually
be approximated by a finite set by Uh.
6
Computational Methods for
Controlled Markov Chains

The chapter presents many of the basic ideas which are in current use for the
solution of the dynamic programming equations for the optimal control and
value function for the approximating Markov chain models. We concentrate
on methods for problems which are of interest over a potentially unbounded
time interval. Numerical methods for the ergodic problem will be discussed
in Chapter 7, and are simple modifications of the ideas of this chapter.
Some approaches to the numerical problem for the finite time problem will
be discussed in Chapter 12.
The basic problem and the equations to be solved are stated in Section
6.1. Section 6.2 treats two types of classical methods: the approximation in
policy space method and the approximation in value space method. These
methods or combinations of them have been used since the early days of
stochastic control theory, and their various combinations underlie all the
other methods which are to be discussed. The first approach can be viewed
as a "descent" method in the space of control policies. The second method
calculates an optimal n-step control and value function and then lets n go
to infinity. The Jacobi and Gauss-Seidel relaxation (iterative) methods are
then discussed. These are fundamental iterative methods which are used
with either the approximation in policy space or the approximation in value
space approach. When the control problem has a discounted cost, then one
can improve the performance of the iterations via use of the bounds given in
Section 6.3. The so-called accelerated Gauss-Seidel methods are described
in Section 6.4. This modification generally yields much faster convergence,
and this is borne out by the numerical data which is presented.
The possible advantages of parallel processing are as interesting for the
154 6. Computational Methods

control problem as for any of the other types of problems for which it has
been considered. Although it is a vast topic, we confine our remarks to the
discussion of several approaches to domain decomposition in Section 6.5.
The size of the state spaces which occur in the approximating Markov chain
models can be quite large and one would like to do as much of the compu-
tation as possible in a smaller state space. Section 6.6 discusses the basic
idea of grid refinement, where one first gets a rough solution on a coarser
state space, and then continues the computation on the desired "finer"
state space, but with a good initial condition obtained from the "coarse"
state space solution. Section 6. 7 outlines the basic multigrid or variable grid
idea, which has been so successful for the numerical solution of many types
of partial differential equations. This method can be used in conjunction
with all of the methods discussed previously. With appropriate adaptations
of the other methods, the result is effective and robust and experimental
evidence suggests that it might be the best currently available. There are
comments on the numerical properties throughout, and more such com-
ments appear in Chapter 7. In Section 6.8, the computational problem is
set up as a linear programming problem. The connections between approx-
imation in policy space and the simplex algorithm are explored and it is
shown that the dual equation is simply the dynamic programming equation.
Some useful information on computation and some interesting examples are
in [15, 19, 135, 150]. An expert system which incorporates some forms of
the Markov chain approximation method is discussed in [20, 21].

6.1 The Problem Formulation


The cost functionals or optimal cost functionals in Section 5.8 can all be
written in the following forms:

Wh(x, u) = L rh(x, yiu(x))Wh(y, u) + Ch(x, u(x)),

l'
y

v•(x) ~ ~f) [ ~=-·(x, YI<>)Vh(y) + c•(x, a)


for appropriate rh(x, ylo:) and Ch(x, o:). The vector forms are
(1.1)

and
(1.2)

respectively.
This chapter is concerned with a variety of methods which have been
found useful for the solution of these equations. Many of the methods come
6.1 The Problem Formulation 155

from or are suggested by methods used for the solution of the systems of
linear equations which arise in the discretization of PDE's, as, for example,
in [142].
Because we will generally be concerned with the solution of equations
[such as {1.1) and {1.2)] for fixed values of h, in order to simplify the
notation we will refer to the state space of the Markov chain simply asS
and rewrite these equations, with the superscript h deleted, as

W(u) = R(u)W(u) + C(u), {1.3)

V = min [R(u)V + C(u)], {1.4)


u(x)EU

where for a feedback control u(·), R(u) = {r(x,yiu(x)), x,y E S} is a


stochastic or substochastic matrix. When working with the variable grid
methods in Sections 6. 7 and 6.8, we will reintroduce the h superscript, so
that we can keep track of the actual and relative grid sizes. The following
assumptions will be used to assure that the equations {1.3) and {1.4) are
meaningful. Our aim is to outline the main computational techniques of
current interest, and not to get the best convergence results for iterative
methods for controlled Markov chains.

Al.l. r(x, yia), C(x, a) are continuous functions of a for each x andy in
s.
A1.2. (i) There is at least one admissible feedback control u 0 (-) such that
R(uo) is a contmction, and the infima of the costs over all admissible con-
trols is bounded from below. {ii) R(u) is a contmction for any feedback
control u( ·) for which the associated cost is bounded.

A1.3. If the cost associated with the use of the feedback controls u 1 (·), ... ,
un(-), ... in sequence, is bounded, then

Richardson Extrapolation. Suppose that the optimal cost for the origi-
nal control problem for the diffusion is V(x) and that for the Markov chain
~pproximation it is Vh(x). Suppose that the solutions are related by (for
appropriate values of x)

Vh(x) = V(x) + V1(x)h + o(h)


for a bounded sequence Vh(x). Then for small enough h, one can get a
more accurate solution by using
156 6. Computational Methods

While there is no current theory to support this practice in general for all
of the problems of interest in this book, it generally yields good results,
if reasonably accurate solutions for small enough parameters h and h/2
are available. The scheme is known as Richardson extrapolation [32]. The
general problem of selecting the approximation for best accuracy and the
appropriate extrapolations needs much further work.

6.2 Classical Iterative Methods: Approximation in


Policy and Value Space
In this section, we discuss the two classical methods. The first, called ap-
proximation in policy space was introduced by Bellman, then extended by
Howard [73] to simple Markov chain models, and later extended to quite
general models. It is somewhat analogous to a gradient procedure in the
space of controls. See the discussion concerning the relations between linear
programming and approximation in policy space in Section 6.8. The second
method, called calculates the optimal cost and control as the limits, as time
n goes to infinity, of the optimal costs and controls for the problems that
are of interest over a finite time interval [0, n] only. This method can also
be interpreted as a fixed point iteration for (1.2).

6.2.1 Approximation in policy space

Theorem 2.1. Assume (Al.l) and (A1.2). Then there is a unique solution
to (1.4), and it is the infimum of the cost functions over all time independent
feedback controls. Let uo( ·) be an admissible feedback control such that the
cost W(uo) is bounded. For n 2: 1, define the sequence of feedback controls
un(-) and costs W(un) recursively by (1.3) together with the formula

U.+t (x) ~ arg :::1!J [~ r(x, yla)W(y, u,.) + C(x, <>)]· (2.1)

Then W(un)--+ V.
Under the additional condition (A1.3), Vis the infimum of the costs over
all admissible control sequences.

Proof. To prove the uniqueness of the solution to (1.4), suppose that there
are two solutions V and V, with minimizing controls denoted by u( ·) and
u( ·), respectively. Recall that all inequalities between vectors are taken
component by component. Then we can write
n-1
V = Rn(u)V + L Ri(u)C(u). (2.2)
i=O
6.2 Classical Iterative Methods 157

The right hand side sum in (2.2), which is obtained by iterating (1.4), is
bounded. Hence, by (A1.2) R(u) is a contraction, and similarly for R(u).
Consequently, the right sum of (2.2) converges to the cost function W(u) =
V. We can write
min [R(u)V
u(x)EU
+ C(u)]
(2.3a)
R(u)V + C(u):::; R(u)V + C(u),

min [R(u)V + C(u)]


u(x)EU
(2.3b)
R(u)V + C(u):::; R(u)V + C(u).
The equalities and inequalities (2.3) yield
R(u)(V- V):::; V- v:::; R(u)(V- V).
Iterating this inequality and using the contraction properties of the R(u)
and R(u) implies that V = V.
By a contraction type of argument such as the one just used, it can be
shown that if a solution to (1.4) exists, then it is the infimum of the costs
W(u) for feedback controls which do not depend on time. Under (A1.3),
it can be shown that any solution to (1.4) is also the infimum over all
admissible controls, and we omit the details.
We now prove that W(un)--+ V. By the definition of W(un) in (1.3) and
the minimizing operation in (2.1), we have
R(un)W(un) +C(un)
> R(un+l)W(un) + C(un+l)
m-1 (2.4)
> Rm(un+l)W(un) + L Ri(un+I)C(un+d·
i=O
As before, (A1.2) implies that R(un+l) is a contraction. Hence, as m--+ oo,
the first term on the far right hand side goes to zero and the sum converges
to the cost W (Un+l). Thus W (Un+l) :::; W (Un) and there is a vector W
such that W(un)-!- W. The inequality
W(un) ;::-: min [R(u)W(un)
u(x)EU
+ C(u)]
implies that
W ;::-: min [R(u)W + C(u)]. (2.5)
u(x)EU
Conversely, the inequality
W(un+d :::; R(un+l)W(un) + C(un+l) = min [R(u)W(un) + C(u)]
u(x)EU

implies that
W:::; min [R(u)W
u(x)EU
+ C(u)]. (2.6)

Inequalities (2.5) and (2.6) imply that W satisfies (1.4). •


158 6. Computational Methods

6.2.2 Approximation in value space

The Jacobi Iteration.

Theorem 2.2. Let u(·) be an admissible feedback control such that R(u) is
a contraction. Then for any initial vector Wo, the sequence Wn defined by

Wn+l(x,u) = Lr(x,ylu(x))Wn(Y,u) +C(x,u(x)) {2.7)


y

converyes to W(u), the unique solution to {1.3). Assume {Al.l)-{Al.3).


Then for any vector V0 , the sequence recursively defined by

Vn+l = min [R(u)Vn


u(x)EU
+ C(u)] {2.8)

converyes to V, the unique solution to {1.4). In detail, {2.8) is

Vn+'(x) ~ ~'!) [~>(x, YI<>)Vn(Y) + C(x, <>)]· (2.9)

Vn is the minimal cost for an n-step problem with terminal cost vector V0 •

Proof. The convergence of Wn(u) is just a consequence of the fact that


R( u) is a contraction. It will be shown next that Vn is the minimal n-step
cost for the control problem with terminal cost V0 . Let un(-) be minimizing
in {2.8) at step n. We can write

V1 = R(u 1 )Vo + C(u 1 ),


n
Vn R(un)···R(u 1 )Vo+ LR(un)···R(ui+ 1 )C(ui),
i=l

which is then-step cost for the policy which uses ui(·) when there are still
i steps to go and with terminal cost Vo. In the above expression, the empty
product TI!+l is defined to be the identity matrix. The minimizing prop-
erty in {2.8) yields that for any other admissible feedback control sequence
{un(·)} for which the cost is bounded, we have

Vn+l ~ R(un+ 1 )Vn + C(un+l).

Iterating the last inequality yields


n+l
Vn+l ~ R(un+l) · · · R(u 1 )Vo + L R(un+ 1 ) • • • R(ui+ 1 )C(ui)

i=l

which is the cost for an n + 1-step process under the controls {ui(-)} and
terminal cost V0 • Thus, Vn is indeed the asserted minimal n-step cost.
6.2 Classical Iterative Methods 159

According to Theorem 2.1, there is a unique solution to (1.4). Let u(-)


be a minimizer in (1.4). Then

R(un+ 1 )Vn + C(un+l) = Vn+l ~ R(u)Vn + C(u),

R(u)V + C(u) = V ~ R(un+l)V + C(un+l),


which implies that

By iterating this latter set of inequalities,

The boundedness of {Vn} follows by the contraction property of R(u).


Thus, by (A1.3) we have

and Vn ~ V. •

The Gauss-Seidel or Successive Approximation Iteration. The it-


eration in (2.7) or (2.8) calculates all the values of the components of
the (n + 1)-st iterates Wn+l(u) and Vn+b respectively, directly from the
components of Wn(u) and Vn, respectively. The newly calculated values
Wn+l(x,u) and Vn+l(x) are not used until they are available for all xES.
If the computation uses successive substitutions, in the sense that each
newly calculated term Wn+l(x,u) or Vn+l(x) is immediately substituted
for the Wn(x) or Vn(x), respectively, in the formulas (2.7) or (2.8), the pro-
cedure is then referred to as the Gauss-Seidel procedure. In particular, let
us order the states in the state space S in some way, and let the inequality
sign <denote the ordering. Then we have the following algorithm [104]:

Theorem 2.3. Let u(·) be an admissible feedback control for which R(u)
is a contraction. For any given Wo, define Wn recursively by

Wn+l(x,u) =
L r(x,yJu(x))Wn+l(y,u) + L r(x,yJu(x))Wn(y,u) + C(x,u(x)).
y<x y~x
(2.10)
Then Wn converges to the unique solution to (1.3).
Assume (A1.1)-(A1.3). For any Vo, define Vn recursively by

Vn+l(x) = ~Jf: [L
y<x
r(x,yJa)Vn+l(Y) + L r(x,yJa)Vn(Y) + C(x,a)].
y~x

(2.11)
160 6. Computational Methods

Then Vn converges to the unique solution of (1.4).

Remark. It is useful to note that there are f(x, ylu) and C(x, u) such that
(2.10) can be written in the form

Wn+l (x, u) = L f(x, ylu)Wn(Y, u) + C(x, u). (2.12)


y

Let x(1) denote the lowest state in the ordering. Then the terms f(x,ylu)
and C(x, u) in (2.12) are defined in the following recursive way:

f(x(1),ylu) = r(x(1),ylu(x(1))),
f(x,ylu) = r(x,ylu(x)) +L r(x,zlu(x))f(z,ylu), y ~ x ~ x(1),
z<x
f(x, ylu) = L r(x, zlu(x))f(z, ylu), x > y ~ x(1),
z<x
(2.13)
and
C(x, u) = L r(x, ylu(x))C(y, u) + C(x, u(x)). (2.14)
y<x

The proofs of the theorem and of the representation (2.12)-(2.14) are


in [88] or [105], which also contain discussions of the numerical proper-
ties relative to the Jacobi procedure, and the second reference is the first
place where the optimal control form of the Gauss-Seidel (and accelerated)
method appeared. The Gauss-Seidel method of Theorem 2.3 is never in-
ferior to the Jacobi method of Theorem 2.2, and it requires less storage
space. Some numerical properties and comparisons will be presented be-
low. Combinations of the Gauss-Seidel and Jacobi procedures are useful
in implementations on parallel processors, as noted in Section 6.5 below.
Iterations such as (2.7)-(2.11) are sometimes referred to as relaxations and
this term will be used interchangeably with the term iteration.

6. 2. 3 Combined approximation in policy space and


approximation in value space
The approximation in policy space, as stated in Theorem 2.1, requires
the solution to (1.3) for each successive control unO· Because the state
spaces S tend to have many points (perhaps in the millions in some cases
[103]), obtaining a good approximation to the solution to (1.3) for each
control unO might be an onerous task. On the other hand, the "backwards
iteration" or approximation in value space method of Theorem 2.2 or 2.3
might require much calculation at each step in order to get the minima,
generally has slower rate of convergence and does not allow the use of
multigrid type methods, which have been found to be quite powerful.
6.2 Classical Iterative Methods 161

Typical computational methods use the approximation in policy space as


a basis, but get only an approximation to the values W(un) at each step.
For example, let Un ( ·) be the current candidate for the optimal control,
and let Wn be an approximation to W(un)· Calculate the next candidate
Un+t(·) for the optimal control via (2.1) with Wn replacing W(un)· Then,
starting with the initial guess Wn, obtain an approximation Wn+l to the
solution W(un+d to

For example, one might use a sequence of say N Gauss-Seidel relaxations,


starting with initial value Wn and define Wn+l to be the value obtained at
the end of that sequence. Then repeat the procedure until some criterion
of convergence is satisfied. The extreme cases N = 1 and N = oo are,
of course, the approximation in value and approximation in policy space
methods, respectively. Such a procedure emphasizes the necessity for ex-
amining efficient methods for the solution of systems of linear equations,
such as the acceleration, multigrid or aggregation methods introduced in
the following sections and appropriate choices among these methods are
generally preferred to the use of the unaltered Gauss-Seidel relaxation.
Suppose that one considers the computing time required to reach a given
error in the estimate of the V or in the optimal control versus the number N
of relaxations used between policy updates. The details of the graph would
depend on the difficulty of getting the minimum in (2.1). But in general, as
N increases it first drops and ultimately increases. It has been observed in
a variety of two dimensional problems where the time required to get the
minimum in (2.1) for all x E S was of the order of two or three times the
time required for a single relaxation that the best value of N varied between
five and fifteen, which is not a large range. Generally, some experimentation
is required. The procedure which uses several Jacobi relaxations between
each policy update is known as the modified policy iteration algorithm and
convergence proofs for the discounted problem are in [125].

6.2.4 The Gauss-Seidel method: Preferred orderings of the


states
The Jacobi procedure of Theorem 2.1 does not depend on the way the states
are ordered. However, the performance of the Gauss-Seidel procedure does
depend on the ordering. If the diffusion effects dominate those of the drift
terms in the original process from which the Markov chain approximation
was derived and the diffusion is nondegenerate, then the orderings in suc-
cessive iterations might be alternated in a fashion such as illustrated in
Figures 6.1a,b.
162 6. Computational Methods

Figure 6.la. Directions of iteration.

If the diffusion is degenerate and there is a dominant flow direction due


to the effects of the drift term, then the performance is improved if these
effects are accounted for in the ordering. Only a few comments will be made
in order to give the general idea. Consider the special case
dx1 = x2dt,
dx2 = b2(x, u(x) )dt + adw,
and refer to Figures 6.2a,b. For x2 > 0, x1 increases, and conversely for
x2 < 0. Thus the general mean flow is qualitatively as indicated in Figure
6.2a.

Figure 6.1 b. Directions of iteration.


6.2 Classical Iterative Methods 163

Figures 6.2a,b. Directions of the flow.

It should be noted that in the early stages of either the approximation


in policy space or the approximation in value space method, the actual
control which appears can vary quite a bit from iteration to iteration. It is
therefore preferable to order the states in a way that does not depend on
the control, although the control can be taken into account once it "settles
down."
The following rule seems to be useful, where applicable: Order the states
to "tighten" the connection with the absorbing states, going against the
"tendency of the flow" where possible. For the above example, if the outer
boundary is absorbing, then a reasonable ordering is shown in Figure 6.2c.
Here we ignored the component of flow in the ''vertical" direction. If the
general tendency [due to the effects of the b2 (x,u(x)) term] were as in
Figure 6.2b, then one might alternate between the orderings in Figures
6.2c and 6.2d.

Figures 6.2c,d. Directions of iteration.

More detail concerning the mathematics of the role of the orderings in


164 6. Computational Methods

the convergence is in [88]. Some intuitive justification for the recommended


orderings can be seen from the degenerate example, where CJ = 0. Suppose
that the outer boundary is absorbing. If it is possible to order the states
in such a way that the iteration is against the flow, then convergence will
be obtained in one sweep. This is equivalent to the approach to solving the
Hamilton-Jacobi equation by integrating backwards along the characteris-
tic curves. Following this analogy, we can see why the method works well
when there is an absorbing boundary and when it is possible to order the
states such that the iteration is against the tendency of the flow and moves
backwards from the absorbing set. This observation has significant implica-
tions for the numerical solution of deterministic optimal control problems,
and as we will see in Chapter 15, allows one to construct approximations
and associated iterative solvers that are remarkably efficient.

6.3 Error Bounds for Discounted Problems


6. 3.1 The Jacobi iteration
When the problem of interest has a discounted cost function, and the Ja-
cobi or Gauss-Seidel procedure is used to solve (1.3) or (1.4), one can obtain
additional estimates of the actual solutions to these equations. These es-
timates can be used to improve the values given by the current iterates.
First, the Jacobi case will be dealt with, and the control will be fixed (as
will be the case when we need to estimate W(un) for a cycle of the iteration
in policy space method).
Suppose that (1.3) can be written in the form
W(u) = )'Ro(u)W(u) + C(u), (3.1)
where Ro(u) is a stochastic or substochastic matrix and 0 < )' < 1.
For example, if (1.3) or (1.4) arose from the discounted problem (5.8.10),
then (reintroducing the h superscript for the moment), rh(x, yiu(x))
e-f3t:,.t"(x,u(x))ph(x, yiu(x)). Define)' by
'Y = max e-{3t:,.t"(x,u(x)).
X

Then (dropping the h again), (5.8.10) can be written as (3.1) with an


appropriate matrix R 0 (u), whose row sums are no greater that one.
Let the vector Wn denote the n-th estimate of the solution of (3.1).
Define the "Jacobi" relaxation TJ,u(X, C) by TJ,u(X, C) = 1Ro( u)X +
C(u). Then TJ,u(Wn, C(u)) is an "updated" estimate of W(u). Next, some
estimates of W(u) will be stated and they will allow us to improve this
latest estimate TJ,u(Wn, C(u)). Define
maxJTJ,u(Wn, C(u))(x)- Wn(x)],
X
min[TJu(Wn, C(u))(x)- Wn(x)].
X '
6.3 Error Bounds 165

We can now state [11, pp. 190-192], the following theorem.

Theorem 3.1. Under the above conditions, we have

W(x,u) ~ TJ,u(Wn,C(u))(x) +-"~-&max~


1-"(
Wn(x) + 8
1 max,
-"(
(3.2)

W(x,u) 2: TJ,u(Wn,C(u))(x) + -1 'Y &min 2: Wn(x) + 18min. (3.3)


-"( -"(

Applying Theorem 3.1 to the Improvement of the Current Trial


Solution Wn to (1.3). To improve the current estimate Wn of the solution
to (3.1), choose any vector Wn+l satisfying

(3.4)

For example, for each x choose the value which is at the midpoint of the
range in (3.4).

6. 3. 2 The Gauss-Seidel procedure


The estimates of Theorem 3.1 can also be used to improve the performance
of the Gauss-Seidel iteration for (3.1). To do this, we need to represent
the Gauss-Seidel iteration for (3.1) in a form analogous to (2.12) for some
appropriate matrix R(u) = {fo(x,yiu(x)), x,y E S}. Given the n-th esti-
mate Wn of the solution to (3.1), the Gauss-Seidel procedure updates the
estimate via the iteration
Tas,u(Wn,C(u))(x) =

"( [2:
y<x
ro(x, yiu(x))Wn+I(Y) +L
y~x
ro(x, yiu(x))Wn(Y)l + C(x, u(x)).
(3.5)
Referring to (2.13) and (2.14), rewrite the iteration (3.5) in the form

Tas,u(Wn, C(u))(x) = "( L fo(x, y[u)Wn(Y) + C(x, u), (3.6)


y

where we sequentially (in x) define (see (2.13) nd (2.14))

fo(x,yiu) ro(x, yiu(x)) + "f L ro(x, ziu(x))fo(z, yiu), y 2: x,


z<z (3.7)
"f L ro(x, ziu(x))fo(z, yiu), x > y,
z<z
166 6. Computational Methods

C(x, u) = C(x, u(x)) + 'Y L ro(x, ylu(x))C(y, u). (3.8)


y<x

Now write (3.6) in the vector form

Tas,u(Wn, C(u)) = 'YRo(u)Wn + C(u). (3.9)

With this representation, Theorem 3.1 can be applied directly, by replac-


ing the TJ,u by Tas,u· The Ro(u) or C(u) need not be explicitly computed,
only (3.5) and the Gauss-Seidel equivalents of 8max and 8min need be com-
puted.
A very similar improvement procedure can be given for the iteration in
value space forms (2.9) and (2.11). See [11, pp. 190-192], for the formulas.

6.4 Accelerated Jacobi and Gauss-Seidel Methods


6.4.1 The accelerated and weighted algorithms
Consider the linear problem (1.3), where the matrix R(u) is a contraction.
Again, let us order the states in some way, so that the Gauss-Seidel method
is well defined. Let n denote the diagonal matrix with the given entries
w(x) > 0, x E S. Given Wo, an initial estimate of the solution to (1.3), let
Wn denote the sequence of estimates of the solution which is obtained by
the following recursion

Wn = R(u)Wn + C(u)

(4.1)

If w(x) > 1 for all x, then this procedure is known as the accelemted Jacobi
(AJ) method, where the w(x) are the acceleration parameters. If w(x) < 1,
then it is called the weighted Jacobi procedure [16].
There are two forms of the accelerated Gauss-Seidel method, depending
on whether the "acceleration" is done at the end of a Gauss-Seidel sweep or
continuously during the sweep. Let w(x) > 1. The first, which we call the
semi-accelerated Gauss-Seidel method (SAGS) is defined by the recursion

Wn(x) = [L
y<x
r(x, ylu(x))Wn(Y) + L r(x, ylu(x))Wn(Y) + C(x, u(x))] ,
y~x

(4.2)
6.4 Accelerated Jacobi and Gauss-Seidel Methods 167

The full accelemted Gauss-Seidel method (AGS) is defined by

Wn+l(x) = w(x)[ Lr(x,yiu(x))Wn+l(Y)


y<x

+ L r(x,yiu(x))Wn(Y) + C(x,u(x))] + (1- w(x))Wn(x).


fi~X
{4.3)
In applications, one generally lets w(x) = w 2: 1, a constant, and we will
do this henceforth. The algorithm (4.3) is also referred to as a successive
overrelaxation method (SOR) [63, Section 10.1.4].
For appropriate values of the acceleration parameter w, the accelerated
Gauss-Seidel methods are preferable to the original Gauss-Seidel method
(where w = 1). For the case (4.1), the acceleration is equivalent to a shift in
the eigenvalues of the matrix R(u). To see this most clearly, let us rewrite
(4.1) in vector form: Define the matrix n = wl, where I is the identity
matrix, write the iteration (4.1), and define the matrix Rw(u) by

Wn+l = [wR(u) + (1-w)I]Wn +wC(u) = Rw(u)Wn +wC(u). (4.4)

There is a similar expression for (4.2), where R(u) is replaced by the R(u)
defined in (2.13). If.\ is an eigenvalue of R( u), then w.\ +(1-w) is an eigen-
value of Rw(u). The idea is to choose a value of w such that the spectral
radius of Rw(u) is less than that of R(u). The accelerated Jacobi procedure
is not generally useful, but the accelerated Gauss-Seidel is. Examples will
be given below. It will be seen below that the AGS is actually the best pro-
cedure among these choices, in the sense that it gives the smallest spectral
radius with appropriate choices of the parameter w.
Generally some experimentation is needed to get appropriate values of
the factor w. This is often easier than it might seem at first sight. For the
approximation in policy space method, one solves a sequence of problems
of the type (1.3). For many problems, it has been experimentally observed
that the range of possible w (those for which the algorithm is stable) as well
as the optimal values of w do not change much from one control to another.
This point will also be pursued further in the example of Subsection 6.4.3.
Also, in doing numerical optimization, one frequently wants to solve a fam-
ily of optimization problems with closely related cost functions or system
dynamics, in order to get controls which are reasonably robust with respect
to possible variations in these quantities. Experimentation with the value
of w in the early stages of such a procedure generally yields useful values for
the rest of the procedure. Accelerated procedures have also been used with
the nonlinear iteration in value space algorithm (1.4). See [105], which first
introduced these "acceleration" methods for the computation of optimal
controls.
168 6. Computational Methods

6.4.2 Numerical comparisons between the basic and


accelerated procedures
General Comments and Comparisons. Some of the salient features
of and relations between the Jacobi and Gauss-Seidel procedures and their
accelerated versions can be seen by means of a simple example. The Markov
chain in this example is a locally consistent approximation of the one di-
mensional system
dx = cxdt + dw, c > 0.
The set G is the unit interval [0, 1], and the endpoints 0 and 1 are in-
stantaneously reflecting. 1/h is assumed to be an integer. Let the state
space of the approximating chain, which includes the reflecting endpoints,
be {0, h, ... , 1- h, 1}. Because the endpoints are instantaneously reflecting,
they communicate with their nearest neighbors with probability one and
"zero delay." Thus, they can be eliminated from the problem. We will do
this and denote the reduced state space by S = {h, ... , 1-h}. Let A E (0, 1)
denote a discount factor and P the matrix of transition probabilities, where
p(x, x ±h) = (1 ± hcx)/2, except for the transition of the endpoints h and
1 - h to themselves. Let he ::; 1. For some given vector C, we wish to solve
the equation
W=APW+C. (4.5)
Recalling the discussion in Section 5.2, we write (4.5) in the "normalized"
form
"' p(x, y) C(x)
W(x) =A~ 1- .\p(x,x) W(y) + 1- .Xp(x,x)' (4.6)
YrX

which we rewrite again with the obvious definitions of PN and eN as

In this example, p(x, x) = 0 unless x equals h or 1- h.


The Gauss-Seidel and Jacobi methods and their variations can be used
with either the normalized equation (4.6) or the unnormalized equation
(4.5). The various transition matrices are tabulated below for the case of
c = 0, h = 1/5, A = .995. Thus, in Tables 4.1 to 4.4 the state space has
four points {.2, .4, .6, .8}. For the Gauss-Seidel case, the sweep is from left to
right. The general comparisons below hold true for any value of c for which
the p(x, y) are nonnegative. The subscript N denotes that the normalized
transition probabilities are used .
.4975 .4975 0
.4975 0 .4975 0
0 .4975 0 .4975
0 0 .4975 .4975
Table 4.1. The matrix AP.
6.4 Accelerated Jacobi and Gauss-Seidel Methods 169

0 .9900 0 0
.4975 0 .4975 0
0 .4975 0 .4975
0 0 .9900 0
Table 4.2. The matrix >.PN .

.4975 .4975 0 0
.2475 .2475 .4975 0
.1231 .1231 .2475 .4975
.0613 .0613 .1231 .7450
Table 4.3. The matrix for the Gauss-Seidel iteration using >.P.

0 .9900 0 0
0 .4925 .4975 0
0 .2450 .2475 .4975
0 .2426 .2450 .4925
Table 4.4. The matrix for the Gauss-Seidel iteration using >.PN.

Because the case at hand, where the states are ordered on the line, is so
special, we need to be careful concerning any generalizations that might be
made. However certain general features will be apparent. With the Gauss-
Seidel forms, the state transitions are not just to nearest neighbors, but to
all the points to which the neighbors are connected, in the direction from
which the sweep comes.
The eigenvalues of the matrix >.Pare in the interval [-.995, .995], those
of >.PN are in [-.9933, .9933], those of the matrix in Table 4.3 in [0, .9920],
and those of the matrix in Table 4.4 in the interval [0, .9867]. (They are all
real valued.) The eigenvalues of the matrices used for the Jacobi iterations,
>.P and >.PN, are real and symmetric about the origin in this case. The
"shift of eigenvalues" argument made in connection with (4.4) above or
(4.7) below, implies that the use of (4.1) is not an improvement over the
Jacobi method if w > 1. The "one sided" distribution of the eigenvalues
for the Gauss-Seidel case suggests that the methods (4.2) and (4.3) might
yield significant improvement over the basic methods.
To see this point more clearly, consider the same example as above, but
with the smaller value of the difference interval h = 1/20. The results
are tabulated in the tables below. where the subscript N denotes that the
normalized matrix PN is used as the basis of the calculation of the matrix
for the Gauss-Seidel or accelerated cases, as appropriate. For this case, the
spectral radius of the Jacobi matrix >.Pis 0.9950, that for the normalized
Jacobi matrix >.PN is 0.9947, and that for the normalized Gauss-Seidel case
170 6. Computational Methods

is 0.9895. For the accelerated procedure, we have the following spectral radii
for the listed cases

w 1.7 1.5
SAGSN .9821 .9842
AGS .9650 .9766
AGSN .9455 .9699
Table 4.5. Spectral radii for the accelerated procedures.

The best procedure is the full AGSN, although the acceleration obtained
here with AGSN is greater than one would normally get for problems in
higher dimensions. Data for a two dimensional case in [104] support this
preference for AGS in the case of the nonlinear iteration in value space
algorithm also.

Comments on the Choice of the Weight w. The AGS method is


particularly hard to analyze because it is nonlinear in the parameter w. To
get some idea of a bound on the range of the parameter, consider the AJ or
SAGS, where the matrix in the iteration can be written as wR + (1- w)I,
for some stochastic or substochastic matrix R. Suppose that the spectrum
of R is in the interval [-a, b], where a and b are nonnegative. Then, by the
"eigenvalue shift" argument, the best weight would equalize the absolute
value of the shifted values of b and -a. This yields

wb + (1- w) = wa- (1- w),

which yields
2
w - ---,------,- (4.7)
- 2- (b- a)"
Thus for larger (b- a), use a larger value of w.

6.4.3 Example
Suppose that the approximation in policy space algorithm of Theorem 2.1
is used to solve (1.4). Then a sequence of solutions unO is generated, and
one wants to get an estimate of the solution to (1.3) for each such control.
We comment next on experimental observations of the sequence of optimal
acceleration parameters. The Markov chain is obtained as an approximation
to the following system where x = (x1, x2) and lui ~ 1:

X2dt, (4.8)
(b2(x) + u)dt + dw.
The state space is defined to be the square G = {x : lxil ~ B, i = 1, 2}
centered about the origin. We let the outer boundary of the square be
6.5 Domain Decomposition 171

reflecting and stop the process at the first time that the strip {x : lx 1 1 ::;
T
8 < B} is hit. Let the cost be W(x,u) = E~ J;[l + lu(s)l]ds. To start the
approximation in policy space method, suppose that the initial control uo(·)
is identically zero, get an approximation to the cost W(u 0 ), and continue
as in Theorem 2.1.
Suppose that we use the AGS method to get approximate solutions to
the linear equations (1.3} for the costs W(un)· The range of acceleration
parameters for which the algorithm is stable and the best acceleration
parameter were observed to decrease slightly as n increases, but not by
much. For n = 0, the best weight is slightly greater than 1.3. This would
be unstable for the optimal control, for which the best parameter is about
1.25. Even when accelerated, the convergence of the AGS methods for (1.3}
was observed to be slower for the initial control than for the optimal control.
Let u(·) denote the optimal control. Then to solve (1.3} to within a given
precision takes more than five times longer with R(u 0 ) than with R(u).
These observations hold true for various functions b2 (·). The conclusions
are problem-dependent, but analogous patterns appear frequently.
For this example, the process is absorbed faster for the optimal con-
trol than for the initial members of the sequence of controls generated by
the policy iteration method. Intuitively, faster absorption means a smaller
spectral radius. For other cases, the best acceleration parameter increases
slightly as n increases.
The method of Section 6. 7 first gets a good approximation to the solution
of the optimization problem on a coarse grid (state space}, and then uses
that approximation as the initial condition for the solution on a finer grid.
We observe experimentally for problems such as the one above that the
optimal acceleration parameter (for the optimal controls) increases slightly
as the grid is refined. This seems to be another consequence of the relative
speed of absorption or killing of the process, which is faster on the coarser
grid. Thus, good values of the weight for the coarse grid would provide a
good start for determining a good weight for the finer grid.

6.5 Domain Decomposition and Implementation


on Parallel Processors
Exploiting the possibilities of parallel processing is obviously important
for any numerical method. We mention only three standard schemes of
the domain decomposition type, in order to show some of the possibilities.
Such methods are interesting and undoubtedly useful. The reader should
keep in mind that the main computational problems for the optimal con-
trol problem concern either high dimensional problems (dimension greater
than three}, problems with complicated boundary conditions, or problems
where great accuracy is required. Parallelization methods for the multigrid
172 6. Computational Methods

method of Section 6.8 are a topic of active research, and the reader can find
useful ideas in the proceedings of many conferences devoted to the subject
[117] and in [128].
In principle, the computations (2.7) for the Jacobi method can be done
simultaneously for each of the states, because the values of the (n + 1)-st
iterate Wn+l(x,u) for all xES are computed directly from the values of
the n-th iterate Wn(Y, u), y E S. On the other hand, the computation of
Wn+l(x, u) in (2.10) for each state x depends on the values of the (n+1)-st
iterate Wn+l(y, u) for the states y < x. Thus a simultaneous calculation of
all Wn+l(x,u) via a parallel processor is not possible. Many intermediate
variations have been designed with the aim of preserving some of the ad-
vantages of the Gauss-Seidel method, but allowing the use of some parallel
processing. One can, in fact, mix the Gauss-Seidel and Jacobi procedures
in rather arbitrary ways, with the state space being divided into disjoint
groups of points, with a Gauss-Seidel procedure used within the groups
and the groups connected via the Jacobi procedure. Such is the case with
the three exi:J!IIlples below. One well known scheme, called the red-black
Gauss-Seidel method (for obvious reasons!) will be described next [16]. In
fact, with an appropriate ordering of the states, this method is actually a
Gauss-Seidel method.

Red-Black Gauss-Seidel. Refer to Figure 6.3 where the square state


space for the two dimensional problem of interest is defined.

Figure 6.3. Red-black Gauss-Seidel.

For simplicity, suppose that the boundary is absorbing and let the state
space consist of the points of intersection of the grid lines in the figure.
6.5 Domain Decomposition 173

The circles are to be called "black," and the others (excluding the bound-
ary) "red." Suppose that the transitions of the Markov chain from a black
point are either to the red points or to the boundary (but not to other
black points), and that the transitions from the red points are either to
the black points or to the boundary {but not to other red points). This
would be the case if the transition probabilities were obtained by the finite
difference method of Sections 5.1-5.3 applied to a diffusion {no jump term
and diagonal cr(x)cr'(x)) model.
Suppose that we wish to solve {1.3}. The procedure for each iteration is
as follows: Let Wn(u) be the n-th estimate of the solution to {1.3}. First
iterate on all of the black points, using a Jacobi relaxation (the Gauss-
Seidel procedure would give the same result, because the black states do
not communicate to other black states}, obtaining Wn+t(x,u) at the black
points x. Then using these newly computed values, iterate on all the red
points with a Jacobi procedure to get Wn+l(x,u) for the red points x. The
procedure has divided the state space into two groups, each group having
roughly half of the states, and the computation within each group can be
implemented in a parallel fashion.

Domain Decomposition {1). Refer to Figure 6.4, where the domain G


is divided into the disjoint sets G1, G2 and the connecting line Go.

:.i1

r'f
~u

~;;,

Figure 6.4. Domain decomposition {1).

Let the state space S be the intersection of the lines in the figure and sup-
pose that under the given transition probability, the states communicate
only to the nearest neighbors. Thus, the points on G0 communicate with
points in both G1 and G2. The points in G1 (respectively, G2) communicate
only with points in Go and G1 (respectively, Go and G2). A standard de-
composition technique updates Wn(u) via a Gauss-Seidel or Jacobi method
(or accelerated method) in each domain G1 and G2 separately and simulta-
neously. It then uses either a Jacobi or a Gauss-Seidel procedure to update
the values at the points on the connecting set G0 •
In principle, the domain can be subdivided into as many sections as
desired, at the expense of increasing overhead.
174 6. Computational Methods

Figure 6.5. Domain decomposition (2).

Domain Decomposition (2). Refer to Figure 6.5, where the state space
is as in the above two examples, and the states communicate only to their
nearest neighbors as well. A separate processor is assigned to update the
states on each of the "columns" in the figure. Proceed as follows. Let the
n-th estimate Wn(u) be given as above. The memory of the i-th processor
contains the values of Wn(x, u) for states x which are in the (i- 1)-st,
i-th, and (i + 1)-st column. It then obtains Wn+l (x, u) for the states x in
the i-th column by a combined Jacobi and Gauss-Seidel type procedure
which uses Wn(y,u) for yin the (i- 1)-st and (i + 1)-st column, and a
successive substitution for y in the i-th column. After this computation is
done for all columns the new values for each column are then transferred
to the processors for the neighboring columns.

6.6 Coarse Grid-Fine Grid Solutions


The scalar approximation parameter h will be reintroduced in this and in
the next section. A major problem in getting a "quick" solution to (1.1) or
(1.2) via any type ofiterative procedure concerns the initial condition which
is used to start the iteration. To get a good initial condition, it is useful
to first obtain an approximate solution on a coarser state space, where the
calculations require less time. Let h 2 > h1 = h, with the associated state
spaces satisfying Sh 2 c Sh 1 = Sh. A useful procedure for the solution of
(1.2) is to get an approximation (Vh 2 , uh 2 ) to the optimal value function
and associated control on the coarse state space Sh 2 , and then start the
iteration on the finer state space sh with an initial condition and control
(Vh, uh) which are suitable interpolations of (Vh2, uh 2 ). For example, one
6.6 Coarse Grid-Fine Grid Solutions 175

might use the interpolation: ifh(x) = ifh2 (x}, for X E Sh 2 , and for X E
sh - sh2' use a linear interpolation of the values in the smallest set of
points in 8h 2 in whose convex hull X lies, with an analogous interpolation
for the control. It seems preferable to do several "smoothings" via a Gauss-
Seidel (accelerated or not) relaxation before the first update of the control
on the finer grid, whether iteration in policy space or in value space is used.
Such a procedure can be nested. Set h = h1 < · · · < hk, such that the
associated state spaces satisfy

(6.1}

Start the solution procedure by getting an acceptable approximation


(Vh" ,uh") to the optimal value function and control on the coarsest state
space Sh,.. Then interpolate to get an initial condition and initial control
for the iterative procedure for the solution on Sh,._ 1 , etc. This method is
perhaps the crudest of the variable grid methods, but it does save consider-
able time over what is needed to get a solution of comparable accuracy by
working on Sh directly. When applied to the optimization problem {1.2},
the solution on a coarser state space might lack some of the local detail of
the structure of the optimal control which one would get from a solution
to the problem on a finer state space, but it quite commonly shows the
salient features of the optimal control, and is itself often good enough for
a preliminary study of the optimal control problem.
The state space refinement method discussed here, together with the ap-
proximation in policy space and the use of accelerated iterative methods,
is relatively easy to program and is a good first choice unless the additional
speed which the multigrid method allows is worth the additional program-
ming effort. The use of such a state space refinement method still leaves
open the choice of the actual method used for solving the problems on the
chosen sequence of state spaces. For this one can use any of the procedures
discussed so far, as well as the multigrid method of the next section.
Adaptive grid methods are also attractive, where the grid is refined adap-
tively according to the evolution of the computation. The grid is refined
depending on the "local" gradient of the current solution. Such methods
hold the promise of higher accuracy with a limited computational bud-
get [121, 122, 123]. The reference [96] discusses various grid decomposition
methods, which are also appropriate for linear or nonlinear elliptic and
parabolic equations which have the structures that arise in control. The
convergence proofs are all probabilistic, from the point of view of Markov
chains.
176 6. Computational Methods

6. 7 A Multigrid Method
6. 7.1 The smoothing properties of the Gauss-Seidel iteration
In this section, the state spaces will be referred to as grids. In the previ-
ous section, the problems (1.1) and (1.2) were approximately solved on a
coarse grid. Then the approximate solution and associated control, suit-
ably interpolated, were used as initial conditions for the solution of (1.1)
or (1.2) on the original fine grid. As pointed out, one can nest the proce-
dure and get an initial solution for the coarse grid by starting with an even
coarser grid. In this section, we discuss another procedure, the so-called
multigrid method, of exploiting the relative efficiency of different types of
computations at different levels of grid refinement. The multigrid method
is a powerful collection of ideas of wide applicability, and only a very brief
introduction will be given here. A fuller discussion can be found in [16, 120].
The multigrid method was introduced for use in solving optimal stochastic
control problems by Akian and Quadrat [2, 1], and a full discussion of data
and convergence, under appropriate regularity conditions, can be found in
[2]. Because variable grid sizes will be used, the basic scale factor h will be
used.
A key factor in the success of the method is the "smoothing" property
of the Gauss-Seidel relaxation. In order to understand this and to get a
better idea of which computations are best done at the "coarser level," it
is useful to compare the rate of convergence of the Gauss-Seidel procedure
when acting on smooth and on more oscillatory initial conditions. We will
do this for a discretization of the simplest problem, where the system is
x(t) = x + w(t), where w(·) is standard Wiener process, and the process
is absorbed on hitting the endpoints {0, 1} of the interval of concern [0, 1].
For the discretization, let h = 1/N, where N is an integer. The absorbing
points will be deleted, since they do not influence the rate of convergence.
Then, the state space is Sh = {h, ... , 1- h}, a set with N- 1 points.
Let Rh = {rh(x, y), x, y E Sh} denote the transition matrix of the locally
consistent approximating Markov chain which is absorbed on the boundary
and is defined by
0 1/2 o.. .
( 1/2 0 1/2 .. .
R h_
- 0 1/2 o...
.. .. ..
. . .
Since only the role of the initial condition is of concern, set Ch(x) = 0.
Consider the Gauss-Seidel relaxation W~ -t W~+l, defined by

w~+l(x) = L rh(x,y)W~+l(y) + L rh(x, y)W~(y). (7.1)


y<x

Refer to Figure 6.6, where h = 1/N = 1/50 and the initial condition is the
6. 7 A Multigrid Method 177

0.9

0.8

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 6.6. A Gauss-Seidel procedure. Smooth initial condition.

solid line. The two dotted lines show the values after 10 and 50 iterations.
Now refer to Figure 6.7, where the initial condition for the same problem
is the oscillatory solid line, and the dotted lines have the same meaning
as above. Note that the algorithm converges much faster here than in Fig-
ure 6.6 for the smoother initial condition. The situation just described is
actually the general case for Markov chain approximations to the Wiener
process with drift. Loosely speaking, the more "oscillatory" the initial con-
dition [i.e., the energy in the initial condition being essentially in the higher
eigenvalues of Rh] for the iteration (7.1), the faster the convergence. When
the initial condition in (7.1) is "smooth," it is reasonable to suppose that
we can get a quite good approximation to the solution of (1.1) by work-
ing with a coarser grid, because the errors due to the "projection" onto
the coarser grid or the "interpolation" of the resulting value back onto the
finer grid would not be "large."
See Figure 6.8, where the problem with the smooth initial condition is
treated on a grid with the coarser spacing 2h. The times required for the
computation for the cases of Figure 6.8 are approximately the same as
those for the cases in Figure 6.6, although the result is much better in
Figure 6.8. The smoother the initial condition, the better the quality of the
approximation on a coarser grid. The above comments provide an intuitive
picture of the role of the computations on the coarser grid as well as of
the smoothing property of the Gauss-Seidel relaxation. This "smoothing"
property provides one important basis for the multigrid method. FUller and
very enlightening discussions appear in the introductory books [16, 120].
The cited "smoothing" property of the Gauss-Seidel relaxation is not
quite shared by the Jacobi relaxation: w:+l = Rhw: + Ch. Consider the
initial condition Wt defined by Wt(x) = sin(k1rx), 1 ~ k < N. Then the
iteration does converge faster as k increases, up to about the "midpoint"
178 6. Computational Methods

x 0
~
·0.2

-0.4

-0.6

-0.8

-I
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 6.7. A Gauss-Seidel procedure. Oscillatory initial condition.

0.9

0.8

0.7

0.6

x 0.5
~
0.4

0.3

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 6.8. Smooth initial condition, coarser grid.

k = N /2, after which the convergence slows down again. But the weighted
Jacobi method can be used to get the faster convergence for the more
oscillatory initial conditions, for weights w less than unity (recall that we
used weights greater than unity in Section 6.4, but there the weighting
served a different purpose). See [16, Chapter 2] for a fuller discussion.
The so-called smoothing properties of the Gauss-Seidel and weighted
Jacobi iteration are shared by systems other than the "random walk" dis-
cretization of the Wiener process given above. A fuller discussion is beyond
our scope here. But we note that multigrid methods of the type discussed in
the next subsection seem to work well on all of the problem classes to which
they were applied, including the ergodic, heavy traffic and singular control
problems of Chapters 7 and 8, even when their use could not be rigorously
6. 7 A Multigrid Method 179

justified. It should be kept in mind that the entire discussion concerning


smoothness is heuristic, since not much is known about the smoothness of
the solutions of the equations of interest, or about the smoothing properties
of the Gauss-Seidel relaxations which arise from Markov chain approxima-
tions to "degenerate" control problems.
The previous discussion suggests a procedure of the following type for
the solution of the problem (1.1). First, do several Gauss-Seidel relaxations
(accelerated or not) on the finer grid. The error between the trial solution
at this stage and the true solution should be "smoother" than the error
at the start. Then estimate the error via a computation on a coarser grid.
With the estimate in hand, correct the last trial solution on the finer grid
and continue. This rough outline will now be formalized. The reader should
keep in mind that there are numerous variations of the general idea, and we
present only the simplest concepts. Our interest is in providing motivation
and a rough outline. More detail and much practical information is in [119].
Detail on the stochastic problem is in [2].

6. 7. 2 A multigrid method
We will be concerned with the solution of the equation (1.1) on the state
space Sh, and we rewrite (1.1) here for convenience

(7.2)

Let the operator T/:(Xh,Ch(u)) denote either the Gauss-Seidel or acceler-


ated Gauss-Seidel relaxation (2.10), (4.2), or (4.3) applied to (7.2):

Xh-+ Tj;(Xh, Ch(u)).

For the Gauss-Seidel relaxation,

Tj;(Xh, Ch(u)) = flh(u)Xh + 6h(u),


where the quantities flh(u) and 6h(u) are defined by (2.13) and (2.14).
These relaxations will be used below with various values of h and re-
placements for Ch(u). Because a variable grid method will be used and
it will be necessary to work with Sh as well as "coarser" state spaces, a
simplifying assumption will be made on the state space. Once the underly-
ing ideas are understood, it should be apparent that the approach can be
applied to other "shapes," provided that the appropriate projection and
interpolation operators (see below) can be defined. The state space Sh will
be a "regular rectangular grid," defined as follows. Let G be a hyperrect-
angle in JW which is centered about the origin. Let Sh = G n IRh. For some
integer k > 1, let each side of the hyperrectangle be an integral multiple
of 2k- 1 h. This last requirement is introduced so that each coarser grid can
be defined as a subset of the finer grids, without any further notational
180 6. Computational Methods

difficulties. k is called the level of the procedure. Define the sequence of


grid spacings hi = h2i-l, i = 1, ... , k, and the associated state spaces
Sh; = lR'hi n G = lR'h, n Sh. Thus, h1 =hand Rh• (u) is the Rh(u) in (7.2)
with h replaced by hi.
Next, a specific method of splitting the computation between the differ-
ent levels will be motivated. That will be followed by a somewhat loose
outline of the realization for a two-level procedure, and then a general
k-level procedure will be defined. Now consider the following procedure.
Let Wcf denote an initial guess of the solution to (7.2). Then, starting
with the initial condition Xh = Wcf, do several (say, nl) Gauss-Seidel or
accelerated Gauss-Seidel relaxations (7.3)

(7.3)

ending up with the final value which will be denoted by Wf. Next, define
the residual P3 via a Jacobi (or Gauss-Seidel) relaxation

(7.4)

and define the error oWf = Wh(u)- Wf. Then we can write

owt = Rhowf + pg. (7.5)

If (7 .5) can be solved for oWf, then the solution to (7.2) is given by

(7.6)

Loosely speaking, the discussion in the last subsection concerning the


"smoothing" properties of the Gauss-Seidel relaxation suggests that the
residual p8, which is defined on the points of Sh will be "smooth." This
suggests that a good approximation to the solution to (7.5) can be ob-
tained by solving a "projection" of (7.5) onto a coarser grid {say, S2h), and
then "interpolating the approximation back" to the original grid Sh. This
discussion will now be formalized as an algorithm.

A Simple Two Level Multigrid Procedure. In order to carry out the


program outlined above, we need to be more precise concerning the notions
of projection from a finer grid to a coarser grid and interpolation from a
coarser grid to a finer grid. Consider the following two level method. Define
the projection or restriction operator J~h from sh to s2h by

(J~hp){x) = p(x), for X E 82h·

Thus, J~h acting on the function p( ·) simply picks out the values for the
points which are common to both Sh and S2h· This is perhaps the crudest
projection. See further comments at the end of the section.
6.7 A Multigrid Method 181

{a) Given the initial guess Wcf of the solution to {7.2), do n1 accelerated
Gauss-Seidel iterations, getting an updated estimate of Wh(u) which will
be called Wf. Solve for the residual paby {7.4).
{b) Then, starting with initial condition 6X 2 h = 0, get an approximate
solution oW?h (say, via n2 accelerated Gauss-Seidel iterations) to the equa-
tion
(7.7)
on s2h·
(c) Next, given oW[h, we need to interpolate it "back to the finer" grid
Sh. To do this, define an interpolation operator I~h which takes functions
defined on S2h into functions defined on Sh. One natural choice is the
following: Let p be defined on the points of s2h· For X E s2h, set (I~hp)(x) =
p(x). For X E sh - s2h. define (I~hp)(x) to be the linear interpolation of
the smallest set of points in s2h in whose convex hull X lies. This is perhaps
the simplest type of interpolation operator. The new trial solution to (7.2)
is now defined by the updated Wf

wt---+ wt + I~owr. (7.8)


{d) The procedure is now repeated. Starting with the initial value Wf,
get a new trial solution w; to (7.2) [say, by taking n 1 accelerated Gauss-
Seidel iterations (7.3)], calculate the residual Pt = Tj,u(w;,Ch(u))- w;,
project the residual onto S2h, get an approximate solution oWih to (7.7),
where Pt replaces pg, interpolate oWih back to Sh by use of I~h• and
compute the new trial solution to {7.2) by updating as
(7.9)
and continue until the given stopping criterion is satisfied.

A General k-Level Method. Fori= 1, ... , k -1, define the projection


operators 1~;+ 1 and interpolation operators 1~:+ 1 analogously to the defini-
tions used fori= 1 in the two-level case above, and let n1. ... , nk be given
integers. Set n = 0, and let Wcf be an initial guess of the solution to (7.2).
Consider the following. procedure.
1. Starting with the initial value Xh = w:, do n 1 iterations of
xh ---+ r:;(xh, ch(u)),
and denote the final value by w:+l.
2. Calculate the residual p~ by

p~ = Tj,u(w:+l,Ch(u))- w:+l.
3. Define {/,h = J~hp~, and starting with the initial value oX 2h = 0, do
n2 iterations of
182 6. Computational Methods

ending up with the final value which we denote by t5WJi 1 •


4. Calculate the residual p;h by

J,u u n+1> 1n
2h
Pn =
r2h ( •w2h 2h) -u•w2h
n+l·

5. Define J~h = J:j~ p;h, and starting with the initial value t5X 4h = 0, do
n3 iterations of

ending up with the final value which we denote by aw;it·


6. Continue until level k is reached and t5W~.+ 1 is obtained.
7. Given t5W~.t 1 , reset the value t5W:~J: 1 at the (k -1)-st level as

u
•whk-1
n+1 ---+ u•wh•-1
n+1
+ 1hkh•-1 u•whk
n+t·

With this new initial condition, Xh•- 1 = t5W:~J: 1 , do nk-1 iterations of

ending up with the final value which is also denoted by aw:~J: 1 •

Reset the value of t5W:+1 as w:+l ---+ w:+l + 1~2 t5WJi 1 , and with this
new initial condition, do n 2 iterations of

ending up with the final value which is also denoted by w:+l.


9. If the stopping criterion is satisfied, then stop. Otherwise, increase n
by 1 and return to step 1 or 2.
The Gauss-Seidel relaxation can also be used for calculating the residual.
The procedure just described is referred to as a V -cycle, since one descends
from the highest (first) level to the lowest (k-th) and then back to the
highest. One can vary the procedure in many ways; for example, the so-
called W -method goes from the highest level to the lowest, then back
to an intermediate level, then down to the lowest, and finally back up
to the highest. Such methods are often preferred for the solution of the
linear equations which arise from discretizations of many types of partial
differential equations. See e.g., the proceedings in [117].

Comments on Computation. The method seems to be robust. It has


been used successfully on all of the classes of problems dealt with in this
book. It tends to be as good as the other methods, and is generally superior.
There is no convergence proof available for the general problem of interest
here, and it has been observed on some problems that the values eventually
6.8 Linear Programming 183

oscillate slightly. But, even then the asymptotic errors have been quite small
for the problems that have been tried by the authors. In any case, much
more experimentation is needed on the great variety of stochastic control
problems.
Due to the crudity of the projection operator, as well as to the poorly
understood structure of the general problems of interest, we generally used
several accelerated Gauss-Seidel relaxations between each level of projec-
tion/interpolation. Thus these relaxations actually contribute to the con-
vergence via their contraction as well as by their smoothing property. In
much multigrid practice, acceleration is not used, and one uses fewer relax-
ations, as well as more sophisticated schemes such as the W -cycle. It was
pointed out in [64] that the smoothing property of the accelerated Gauss-
Seidel procedure deteriorates as the acceleration parameter increases to its
optimal value.
The projection operator I~=- 1 described above is the simplest possible.
It is not a "smooth" operator, and the success of its use is increased with
added smoothing relaxations. A common alternative is to define the projec-
tion I~=- 1 phk (x) to be the arithmetic mean of the phk (y) for points y which
can be reached in one step (including diagonals) from x. Alternatively, de-
fine the interpolation operator first and then define the projection operator
to be its transpose. The Rhk on the coarser grids can also be chosen in
a more sophisticated manner. See the references for more detail. We also
direct the reader's attention to the algebraic multigrid method [119], where
the interpolation and projection operators (as well as the coarser grids) are
tailored to the actual values of the transition probabilities.

6.8 Linear Programming Formulations and


Constraints
6. 8.1 Linear programming
The solution to (1.4) can also be obtained via linear programming (LP).
At this time, it does not appear that a direct use of the simplex method
of linear programming is better than any of the other methods discussed
previously in this chapter. But an LP formulation does allow the introduc-
tion of "mean value" constraints on the paths or control effort, and the LP
codes are being improved continuously. First, the formulation of the gen-
eral linear programming problem will be reviewed. Then the problem of
computing the minimum values V(x) and minimizing controls will be put
into an LP form, and both the primal and dual equations exhibited. It will
be seen that the dual equations are just (1.4). The basic approach follows
the development in [105]. Assumption (A1.2) will be used throughout.
An optimal stopping problem with a constraint on the mean time to
184 6. Computational Methods

stopping was discussed in [90]. Problems with reflected diffusion models and
constraints were dealt with in [106] and [153], and will not be discussed here.
Other references which have dealt with LP formulations of Markov chain
optimization problems are [38] and [81]. Only a few basic facts concerning
linear programming will be stated, and the reader is referred to any of the
many standard references (e.g., [9, 61]) for more detail.

The Basic LP. For a column vector b, row vector c, and a matrix A,
all of compatible dimensions, the basic LP form is the following. Choose
X= {X1, ... , Xq} satisfying

AX=b, X ~0 (8.1)

and which minimizes


z=cX.
All inequalities between vectors are assumed to hold component by com-
ponent. Thus X ~ 0 means that all components of X are nonnegative.
Equation (8.1) is called the primal problem. The dual problem is defined to
be (li can have any sign)

max b'Y : Y' A :::; c. (8.2)

Let Ai denote the i-th column of A. Define the row vector D = c- Y' A.
In terms of the components, Di = Ci- Y' Ai. The so-called complementary
slackness condition is
(8.3)
We say that a vector X ~ 0 (respectively, Y) is primal (respectively, dual)
feasible if AX= b (respectively, Y' A:::; c). A well known fact in LP is that
a vector X is optimal if and only if it is primal feasible, and there is a dual
feasible Y such that complementary slackness holds.

The Simplex Procedure. Only a rough outline will be given. Let m


denote the number of rows of A and q the number of columns (the dimension
of the vector X) and suppose that A has more columns than rows. Let A
be offull rank (this is the case in the Markov chain optimization problem).
Suppose that there is a primal feasible solution X= (X1 , ... , Xq) which can
be partitioned as follows: There are J(X) = {i1. ... , im}, such that for j (j_
I(X), we have Xi= 0, and the matrix {Ai 1 , ••• , Ai,.} = B is nonsingular.
Then X is said to be a basic feasible solution, the components {Xi, j E
I(X)} are said to be the basic variables or basis, and B is said to be the basis
matrix. If there is an optimal solution, then there is one in the class of basic
feasible solutions. The simplex procedure computes the optimum by getting
a minimizing sequence in the class of basic feasible solutions. Suppose that
we have a basic feasible solution X. Let us reorder the components so
that we can write X as X = (X B, X N B), where X B is the vector of basic
6.8 Linear Programming 185

variables and XNB is the vector of nonbasic variables. Similarly partition


the matrix A as A= [B, (NB)], and write c = (cB,CNB)·
With this new notation, we can write the cost function and constraints
(8.1) as

(8.4)
BXB + (NB)XNB b.
We wish to replace the current basis by one with a smaller cost, if possible.
To help us see what needs to be done, the equation (8.4) will be transformed
into a form which allows us to observe directly the derivatives of the cost
with respect to each of the basic and nonbasic variables. Let us subtract
a linear combination of the rows of the second equation of (8.4) from the
first equation such that the term cBXB is canceled. The multipliers in the
linear combination are easily seen to be Y' = CBB- 1 • Then rewrite (8.4)
as
(O)XB + [cNB- Y'(NB)]XNB + Y'b = z, (8.5)
XB + B- 1 (NB)XNB = B- 1 b. (8.6)
Because XNB = 0, (8.5) implies that z = cBB- 1b, the cost under the
current basis. The multiplier vector Y is not necessarily dual feasible.
At the current basic solution X, equation (8.5) implies that Di = ci-
Y' Ai is the derivative of the cost with respect to the variable Xi. Since
Di = 0 if Xi is a basic variable, the derivative of the cost (at the current
vector X) with respect to a basic variable is zero. Note that if Di 2: 0 for
all i, then Y is dual feasible, and complementary slackness holds. In the
actual simplex procedure, B- 1 is not actually explicitly computed anew
at each update of the basis. It is calculated in a relatively simple manner
using the previous basis inverse.
If Di 2: 0 for all i, then the current solution is optimal, and the proce-
dure can be stopped. Otherwise, it is possible to select an improved basic
solution. To do this, define an index io by

minD·=
. 3 D·to
3

and introduce the nonbasic variable Xio into the basis at the maximal
level consistent with the preservation of feasibility. This will involve the
elimination of one of the current basic variables from the basis. Note that

(8.7)

It can be seen from (8.7) that as Xio increases from the value zero, at least
some of the components of the vector XB might have to change value to
assure feasibility. There are three possibilities. Case (a): Xio can increase
without bound, with the cost going to -oo. This would occur if none of the
components of B- 1 Aio were positive. In this case, stop. Case (b): It might
186 6. Computational Methods

be impossible to raise xio above zero, without driving some other basic
variable Xi 1 (which would have to have been at the zero level) negative.
In this case, simply replace Xh by Xio in the basis. Case (c): Xio can be
increased to a finite but nonzero level before some other basic variable, say
Xi 1 , becomes zero. Then replace Xi 1 by Xio in the basis and set it at the
maximum level. The procedure is then repeated. The exact details of this
so-called "pivoting" procedure are not important for the discussion here.
The method is not quite a gradient procedure, since it attempts to get a
decrease in cost by increasing the value of only a single nonbasic variable
at a time, with the other variables changing their values only to the extent
necessary to maintain feasibility.

6.8.2 The LP formulation of the Markov chain control


problem
The problem of computing the minimum cost and minimizing control for
the Markov chain control problem [i.e., solving {1.4)) will now be set up as
aLP problem. Assume {Al.2). If, for a given control u(·), R(u) in {1.4) is a
contraction, then E r(x,yiu(x)) < 1 for at least one state x. But R(u) is
still a transition pro~ability matrix for some "killed" Markov chain, which
we denote by {en, n < oo}. In this subsection, the variables i, j will be used
to denote the states of the chain, and not the x, y of the previous sections.
The development is simpler if the control action space U contains only
a finite number of points. Many cases of practical interest reduce to this.
For example, let U = [-1, 1], and suppose that the dynamics of the original
system are linear in the control and the cost function either does not depend
explicitly on the control or else depends linearly on its absolute value.
Then the control will take one of the three values {0, -1, 1}, so an a priori
restriction to this finite set can be made. If the finiteness assumption does
not hold, then the so-called "generalized programming" [111) could be used
and would lead to the same conclusions. Let m denote the number of points
inS, the state space of the chain, and write U = {a1, ... ,aL}. The LP
formulation is in terms of randomized Markov or (equivalently) randomized
feedback controls. Thus, the control u( ·) is to be defined by the conditional
probabilities

Suppose that the initial state eo is a random variable with the probability
distribution
P{eo = i} =Pi> 0, for alliES.
It will turn out that the values of Pi do not affect the optimal control,
unless there are added constraints [105). Let Ep denote the expectation of
functionals of the Markov chain which is associated with whatever R(u)
is used and the distribution p of the initial condition eo. {The u(·) will be
6.8 Linear Programming 187

clear from the context.) For i E S and k ~ L, let Mik denote the joint
(state, control) mean occurrence times defined by
00

Mik = L Epl{e.. =i,u,.=a~o}·


n=O

Thus, Mik is the mean number of times that the state is i and the action is
ak simultaneously. We emphasize that these mean values are for the chain
with the degenerate transition matrix R(u). The probability that action ak
is used when the state is i is

{8.8)

where Mi = Lk Mik· The mean occupancy times satisfy the equation

j,k {8.9)

In terms of the control variables, (8.9) equals

Mi =Pi+ Lr(j,ilak)'"Y;kM;. (8.9')


j,k

Equation (8.9) is the primal constraint equation, the analogue of the


constraint (8.1) for the Markov chain problem. Let u(·) denote the ran-
domized control defined by {8.8). Let us define the total weighted cost
Wp(u) = Li W(i, u)pi in terms of the Mik as

Wp(u) = L C(i,ak)Mik =eM, (8.10)


i,k

which is a linear function of the Mik· The constraints (8.9) and cost (8.10)
constitute the LP formulation of the optimization problem. Additional con-
straints of the form
L dikjMik ~ q;
i,k

can be added.

Comments. Any basic feasible solution to (8.9) will have at most m


nonzero elements. By definition, the basis contains exactly m of the vari-
ables Mik· Because we have assumed that Pi > 0 for all i, for each i we
must have Mik > 0 for at least one value of k. In additioil, the number
of constraints in the first line of (8.9) equals the number m of states in
S. These considerations imply that, for each i, there is one and only one
188 6. Computational Methods

value of k [to be called k( i)] for which Mik > 0. Otherwise, more than m
of the variables Mik would be positive. Thus, any basic feasible solution
yields a pure Markov control: There is no randomization. If additional lin-
ear constraints were added to (8.9), then the basic solution might contain
randomized controls for some of the states.
Suppose that u( ·) is a control which is determined by the probability law
bik}· That is,

P{en+l = ilen = i, u(·) used}= L r(i,jlo:khJk =r(i,jlu).


k

Let R(u) = {r(i,jlu), i,j E S} be a contraction. Then (8.9') has a unique


solution, and this solution must be the vector of mean occupancy times.
The solutions to (8.9) and (8.9') are identical. Hence, the solution to (8.9)
is also the vector of mean occupancy times.
Let uo(·) be a pure Markov (nonrandomized) control such that R(uo) is
a contraction. Such a control exists by assumption (A1.2). Let us start the
simplex iteration with this control. That is, the variables in the starting
basis are {Miuo(i)•i E S}. Let {M~k} denote the associated initial feasible
solution to (8.9). Let Mn = {Mi'l,, i, k} denote the n-th feasible solu-
tion generated by the simplex procedure, and define 'Yil: = M:t.,/Mr. Let
{un(-)} denote the associated control law. The sequence of costs cMn is
nonincreasing inn. Hence, by (A1.2) and an argument like that used in the
last paragraph, the R(un) are all contractions, the solution to (8.9) and
(8.9') are equal and unique, and {Mi'l,, i, k} are the mean occupancy times
for all n. See [105] for proofs and further discussion.

The Dual LP. Let Yi denote the dual variables for the LP (8.9), (8.10).
Then, by analogy to the relation between (8.1) and (8.2), we see that the
dual equations to (8.9) can be written as

Yi ~ L r(i,jlo:k)Yj + C(i, o:k), for all i, k. (8.11)


j

The dual cost is


(8.12)

The complementary slackness conditions (8.3) can be expressed in the fol-


lowing form:
(a) If (8.11) is a strict inequality fori, k, then Mik = 0.
(b) If Mik > 0, then there is strict equality in (8.11) for i,k.
Recall that Mik > 0 for one and only one value of k for each i. Thus the
complementary slackness conditions reduce to

Y; ~ m,in [ ~ r(i,Jia,)Y; + C(i, <>>)] (8.13)


6.8 Linear Programming 189

which is just the dynamic programming equation {1.4). Thus, the optimal
dual variables are the minimum cost values: Yi = V(i).

The Simplex Procedure and Approximation in Policy Space. Let


us recapitulate the simplex procedure for the Markov chain optimization
problem. The appropriate forms of the components of the vector of "re-
duced costs" Dik = Cik - Y' Aik are

Dik = C(i, ak) + L r(i,jlak)lj- Yi. (8.14)


j

Let Mik(i), i E S, denote the current set of basic variables. Then Dik(i) =0
for all i E S. By examining the set of linear equations

0 = C(i, ak(i)) + L r(i,jjak(i))lj- Yi, (8.15)


j

we see that they are the same as (1.3) for the current control u(i) = k(i).
This implies that Yi = W(i,u). If Dik ~ 0 for all i and k, then stop, since
optimality has been achieved. Otherwise, define (io, ko) by

Then the new control for state io will be ako . The controls for the other
states i =f. io remain as before. The pivoting procedure calculates the cost
for the new control. Thus, approximation in policy space is equivalent to
the simplex procedure if the control for only one state (the one yielding
the most negative derivative of the cost) is changed on each policy update
(as opposed to the possibility of changing the controls for all of the states
simultaneously). See [105].
7
The Ergodic Cost Problem:
Formulation and Algorithms

In this chapter, we reformulate some of the concepts in the last chapter so


that they can be used on the ergodic cost problem. Before doing that it
is useful to discuss the appropriate dynamic programming equations and
some additional background material. The natural state spaces for control
problems that are of interest over a long time interval are often unbounded,
and they must be truncated for numerical purposes. One standard way of
doing this involves a reflecting boundary, and this is the case dealt with in
this chapter. Thus, there are no absorbing states. The basic process is the
controlled diffusion {5.3.1) or jump diffusion {5.6.1). The approximating
Markov chains {e~, n < oo} will be locally consistent with these processes.
As in the previous chapters, Sh denotes the state space, and act c Sh the
set of reflecting states. Recall that the reflection is instantaneous and that
we use ~th(x,a) = 0 for X E act. For a feedback control u(·), the cost
function of interest for the original process is

'Y(x, u) = lim:up ~ 1T E;k(x(s), u(x(s)))ds,


where the function k(·) is continuous. If the limit does not depend on the
initial state x, then we omit it from the notation.
In the first few sections, we leave the approximation problem aside and
work with general controlled Markov chain models. In Section 7.1, the ap-
propriate form of the dynamical equation for the value function for the
ergodic cost problem for a fixed control is given, as well as the dynamic
programming equation for the optimal control and cost. Sections 7.2 and
7.3 concern the appropriate forms of the approximation in value space and
192 7. The Ergodic Cost Problem

approximation in policy space algorithms for the case where there is a sin-
gle ergodic class under each feedback control. The matrices P(u) which
appear in these algorithms are not contractions. To formulate the numer-
ical algorithms, a centered form of the transition matrix is introduced in
Section 7.3. This centered form enjoys the contraction property. The appm-
priate adaptations of the numerical algorithms of Chapter 5 are discussed
in Section 7.4.
In Section 7.5, we return to the ergodic cost problem for the approximat-
ing chain. The appropriate form of the cost function for the approximating
Markov chain is given together with a heuristic discussion of why it is
the correct one. The dynamic programming equation for the optimal value
function is also stated. Owing to the possibility of non constant interpo-
lation intervals, and to our desire that the limits of the cost functions for
the chains approximate that for the original diffusion or jump diffusion,
the formulation is slightly different than the one used for the Markov chain
model of Sections 7.1-7.4. In Section 7.6, it is shown that the dynamic
programming equation given in Section 7.5 is also the correct one for the
analogous ergodic cost problem for the continuous parameter Markov chain
interpolation 1/Jh(·) introduced in Chapter 4.
The main difficulty in directly applying the computational methods of
Chapter 5 to the approximating chain and the ergodic cost problem stems
from the fact that the interpolation interval might depend on either the
state or the control. With this dependence, the dynamic programming
equation for the ergodic cost problem cannot usually be easily solved by
a recursive procedure. In Section 7.7, we review the procedure for getting
chains with constant interpolation intervals from general approximating
chains and discuss the computational consequences.
In Sections 7.6 and 7.7, we suppose that there is neither a cost nor a
control acting on the boundary (the boundary must be reflecting and not
absorbing). Section 7.8 gives the few changes that are needed for a more
general case.

7.1 The Control Problem for the Markov Chain:


Formulation
Let {(m n < oo} be a controlled Markov chain on a finite state space S.
The control at each state x takes values in the compact set U. For an
admissible feedback law u(·), let P(u) = {p(x,yiu(x)), x,y E S} denote
the corresponding transition matrix. Unless otherwise mentioned, we use
the following assumptions.

Al.l. For each control, the state space consists of tmnsient states plus a
single communicating aperiodic class.
7.1 The Control Problem for the Markov Chain: Formulation 193

A1.2. C(x, ·) and p(x, yi·) are continuous functions of the control param-
eter.

Other conditions as well as replacements for (A1.1) will be stated as needed.


For a feedback control u( ·}, let the cost function be
N-l
-y(x,u) = limsup ~E; L C(~n,u(~n)). (1.1)
N n=O

If the lim sup is not a function of the initial condition, then it will be
written as -y(u). Assumption (A1.1) does cover many applications, and it
simplifies the development considerably. It does not hold in all cases of
interest for the approximating Markov chains. For example, in the singular
control problem of Chapter 8, one can easily construct controls for which
the assumption is violated. Nevertheless, the algorithms seem to perform
well for the "usual" problems on which we have worked. Weaker conditions
are used in Theorem 1.1, under a finiteness condition on U.

The Functional Equation for the Value Function for a Given Con-
trol. Let us recapitulate some of the statements in Subsection 2.1.3. By
(A1.1), for each feedback control u(·), there is a unique invariant measure
which will be denoted by the row vector 1t'(u) = {1l'(x, u), xES}, and (1.1)
does not depend on x. As noted in Subsection 2.1.3, there is a vector val-
ued function W(u) with values {W(x,u),x E S}, such that (W(x,u),-y(u))
satisfy

W(x,u) = LP(x,yiu(x))W(y,u) +C(x,u(x)) --y(u). (1.2)


y

In fact, the function defined by


00

W(x, u) = L E;[C(~n. u(en))- -y(u)] (1.3}


n=O

satisfies (1.2).
The solution to (1.2) is not unique, since if (W(u), -y(u)) is a solution and
k a constant, then the function defined by W(x, u) + k together with the
same -y( u) is also a solution. However, it is true that if we normalize the
set of possible solutions by selecting some state xo E S and insisting that
W(xo, u) = K, for some given constant K, then the solution (W(u},-y(u))
is unique. Generally, K is chosen to equal either zero or -y( u).
Let (W, i) satisfy

W(x) = LP(X, yiu(x))W(y) + C(x, u(x))- ,:Y. (1.4)


y
194 7. The Ergodic Cost Problem

Then .:Y = 'Y(u). To see this, multiply each side of (1.4) on the left by ?l'(x, u),
sum over x, and use the fact that ?l'(u)P(u) = ?l'(u).

The Dynamic Programming Equation. Define 1 = infu 'Y(u), where


the infimum is over all feedback controls u(·). Then there is an auxiliary
function V(-) such that the dynamic programming equation is

Theorem 1.1. Assume either condition (i),(ii), or (iii) below.


(i): U has only a finite number of points. For each pair of states x, y there
is an integer k and a feedback control u( ·) such that

P{ek = Yieo = x, u(·) used}> 0.

(ii): (A1.2) holds and S is a single recurrent and aperiodic class for each
feedback control.
(iii): (A1.1) and (A1.2) hold.
Then, there exists a unique solution 1 to (1.5).

':i'he proof under (i) is [11, Proposition 4, Chapter 7]. The proof under
(iii) is Theorem 8.12 in [126]. The proof under the stronger condition (ii)
is in Theorem 3.1 below.

Theorem 1.2. Assume (A1.2) and suppose that there is a solution (V, .:Y)
to

V(x) ~ ~~ [ ~ p(x, yla )V(y) + C(x, a) - 'j]· (1.6)

Then there is a feedback control u( ·) such that .:Y = 'Y( u). Consider any
sequence u = {uo(·),ul(·), ... } of feedback controls. Then

The result remains true for any admissible control sequence. Thus .:Y = 1·

Proof. Suppose that there are V and .:Y satisfying (1.6). Let the minimum
be taken on at u(x) and let u(·) be another feedback control. Then

V(x) LP(x,yju(x))V(y) + C(x,u(x))- .:y


y
(1.6')
< LP(X, yju(x))V(y) + C(x, u(x)) -1.
y
7.1 The Control Problem for the Markov Chain: Formulation 195

Define the lSI-dimensional column vector e = (1, ... , 1). By iterating, we


get
n-1
V = pn(u)V + L pi(u)[C(u)- e)i].
i=O

Divide all terms by n and let n -+ oo to get

Now, let u = {u0 ( ·), u 1 ( ·), ... } be any sequence of feedback controls.
Then, iterating (1.6') yields
n
V::; P(uo)P(ui) ... P(un)V + L P(uo) · · · P(ui_!)[C(ui)- e-t].
i=O

Divide both sides by n, and let n -+ oo to get

which proves the optimality with respect to all sequences of feedback con-
trols. The additional details needed to get the result for an arbitrary ad-
missible control sequence will not be given. See [88, Section 6.6], or [131] .

A Centered Form of the Dynamical Equations. There are alter-
native "centered" forms of (1.2) and (1.6) which will be useful in the
next few sections. Suppose that, for a feedback control u(·), we normal-
ize the function W(u) in (1.2) by choosing a particular state xo and set-
ting W(xo,u) = 'Y(u). Define the lSI-dimensional column vector C(u) =
{C(x,u(x)), xES}. Then we can write (1.2) as

W(u) P(u)w(u) + C(u),


(1. 7)
w(u) W(u)- eW(xo, u).

Because P(u)e = e, we see that we must have 'Y(u) = W(xo,u). Analo-


gously, the dynamic programming equation (1.6) can be written as

V = min [P(u)v + C(u)],


u(x)EU (1.8)
v = V- eV(xo).

In this form, 1 = V(x 0 ). In (1.8), the minimum is taken line by line, as


usual.
196 7. The Ergodic Cost Problem

7.2 A Jacobi Type Iteration


One way of looking at the average cost per unit time problem is as a
control problem over a very large but finite time interval. Consider a control
problem over the time interval [0, N] with cost function

N
W(x, N, u) = E; L C(~i' u(~i)). (2.1)
i=O

The associated dynamic programming equation is (see Section 2.5)

V(x, N + I) ~ ~'IJ [~ p(x, yla)V(y, N) + C(x, a)] , (2.2}

where V(x, 0) = 0. For large values of N, the cost function (2.1) differs from
(1.1) mainly in the normalization by N. Thus it is intuitively reasonable
to expect that the optimal control for the ergodic problem will be well
approximated by that for the finite time problem if N is large, and that
the normalized cost function V(x, N)/N will converge to 1 as N --+ oo.
Under appropriate conditions this does happen, and we now give a more
precise statement of the result.
Let {unO, n < oo} be a sequence of admissible feedback controls, where
ui(·) is to be used at time i. Define the (n + 1)-step probability transition
matrix
{pn(x, yluo, ... , Un), x, yES}= P(uo) · · · P(un)·

A2.1. There is a state xo, an n 0 < oo, and an Eo > 0 such that for all
uo(·), ... , UnaO and all xES

P{~no = xol~o = x,uo, ... ,Un0 -IUsed} >Eo.

Define vectors Vn and Vn recursively by Vo(x) = vo(x) = 0, and

Vn+l = minu(x)Eu[P(u)vn + C(u)],


(2.3)
Vn = Vn- eVn(xo).

Note the similarity between (2.2) and (2.3). Equation (2.3) is essentially
(2.2) with a centering or normalization at each time step in order to keep
the cost from blowing up.

Theorem 2.1. Assume (Al.2) and either of (i)-(ii) below.


(i): (A2.1) holds.
(ii): (ALl) holds for every optimal feedback control. For some optimal
feedback control, the chain is also aperiodic.
7.3 Approximation in Policy Space 197

Then Vn(xo) converges to 1 and the dynamic programming equation (1.6)


holds.

Remarks. The result under condition (i) is due to White ([11],[88, Section
6.5], [126, Theorem 8.18], [154]) and that under (ii) is in [53]. See the survey
[126] for additional information and references. Note that the algorithm
defined by (2.3) is a Jacobi type iteration. There does not seem to be a
Gauss-Seidel version known at this time, although there is for the centered
form introduced below. Because the approximation in policy space type
methods seem to be preferable for most problems, (2.3) is not widely used.

7.3 Approximation in Policy Space


The basic approximation in policy space algorithm is the following. Suppose
that a feedback control unO is given, and let (W(un),'Y(un)) denote the
corresponding solution to (1.2) or, equivalently, (1.7). The algorithm defines
the sequence of feedback controls Un+I(·) by (6.2.1); that is, by

U.+t(z) ~ru-g~~ [~>(z, yla)W(y, u.) + C(z,a)] (3.1)

or, alternatively, by

U.+t (z) ~ arg ~ [ ~>(z, Yl<> )w(y, u.) + C(x, a)]· (3.2)

We can now state the following theorem. Other results are in the references.

Theorem 3.1. Assume (A1.2), and that Sis a single recurrent and aperi-
odic class under each feedback control. Then there is i such that 'Y(un) .!. ;y,
and there is a V such that (V,,:Y) satisfy (1.6), or equivalently, (1.8).

Proof. We show that (1.6) holds. By the minimization in (3.1) and an


argument like that below (1.6), it follows that 'Y(Un+I) ::::; 'Y(un)· Hence
there is some i' such that 'Y(un).!. ,:Y.
We next show that the associated sequence {W(un), n < oo} can be
assumed to be bounded. Because any solution W(u) will have the form
{1.3) modulo a vector with identical components, we can suppose that the
W(u) are given by (1.3). Note that the absolute value of the subdominant
eigenvalue of P(u) is bounded away from unity uniformly over all feedback
controls u(·). This assertion follows from the following argument. First,
note that the hypotheses imply that there is only one eigenvalue whose
norm equals one {which is the eigenvalue of value one). Suppose that for
198 7. The Ergodic Cost Problem

some sequence {Un (·)} the absolute values of the sequence of subdominant
eigenvalues of P(un) converges to one. We can suppose (by choosing a
subsequence if necessary) that un(-) converges to a control u(·). Then P(u)
has at least two eigenvalues whose norms equal one, a contradiction to the
condition that there is a single recurrent class under u(·). It follows from
the assertion that the W(u) defined by (1.3) are bounded uniformly in u(·).
Let nk ---+ oo be a sequence such that Un~o ( ·), Un~o +1 ( ·), W (Un~o), respec-
tively, converge to limits u(·),u(·), W, respectively. We have

and
ei = (P(u)- I)W + C(u),
which implies that 1' = -y(u). On the other hand, by (3.1)
min [(P(u)- + C(u)]
I)W(un~o)
u(x)EU
[(P(un~o+l)- I)W(un~o) + C(un~o+I)].

Thus
min [(P(u)- I)W + C(u)]
u(x)EU
[(P(u)- I)W + C(u)].
Because there are no transient states by hypothesis, 7r(X, u) > 0 for all
xES. Now multiply the left and right sides by 1r(u) to get ;y = -y(u) ~ -y(u),
where the inequality is strict unless (*) is an equality. But 'Y(u) = 'Y(u) =
lim'Y(un), which implies that(*) is an equality. •
n

Remarks. Although it is not part of the theorem statement, it is a common


experience that the sequence of controls Un ( ·) also converges. One generally
uses approximations to the solutions of {1.2) or (1.7) for the controls un(-).
Recall that the matrix P( u) has an eigenvalue of value unity. Thus it is
not a contraction, and the usual Jacobi or Gauss-Seidel iterative methods
cannot be used directly. The method of Theorem 2.1 involved a centering
or renormalization on each step, and that centering was crucial to the
convergence. There is a modified form of the centering in (1.7) which will
allow the use of the Jacobi, Gauss-Seidel, and similar methods, and which
will be described next. The centered equations will be the basis of the
numerical algorithms to be discussed in Section 7.4.

A "Centered" Form of {1.7}. Fix xo E Sand let u(·) be a feedback


control. Let P(xo, u) denote the row vector which is the xo-th row of P(u).
As in (1.7), we center W(u) by setting 'Y(u) = W(xo, u). Define the centered
transition matrix Pe(u) and cost vector Ce(u) by

Pe(u) = P(u)- eP(xo, u) = {Pe(x, yJu(x)), x, yES},


7.4 Numerical Methods 199

Ce(u) = C(u)- eC(xo, u(xo)).


Equation (1.7) implies that

w(u) +eW(x0 ,u) = P(u)w(u) +C(u),

W(xo, u) = P(xo, u)w(u) + C(xo, u(xo)). (3.3)


Using these expressions, we can rewrite (1.7) as

w(u) = Pe(u)w(u) + Ce(u). (3.4)

Note that the xo-th rows of Pe(u) and Ce(u) are zero. Given the value
of w(u), both W(u) and 'Y(u) can be calculated. The key property of the
matrix Pe (u) is the fact that its spectral radius is less than unity, so that the
iterative algorithms of Chapter 6 can be used for the solution of (3.4). This
centering idea seems to have originated in [12] in their work on aggregation
algorithms for the ergodic problem.

Lemma 3.2. If the chain under the feedback control u( ·) contains only
transient states and a single aperiodic recurrent class, then the spectral
radius of Pe (u) is less than unity.

Proof. Note that P(x0 ,u)e = 1 and P(u)e =e. Thus,

P;(u) = Pe(u)P(u)- [P(u)- eP(xo,u)]eP(xo,u) = Pe(u)P(u).


Thus, for n > 1, P:(u) = Pe(u)pn-I(u). Let pi, vi} denote the eigenvalues
and associated eigenvectors of P(u), where we order them such that AI= 1
and VI =e. For a given vector x, define the coefficients c; by x = Ei c;vi.
Then
P:(u)x = Pe(u)Pn-I(u)x = LPe(u)c;.X~-IVi.
i

Because VI = e and Pe(u)e = 0, we see that the dominant eigenvalue of


Pe(u) is the subdominant eigenvalue of P(u). •

7.4 Numerical Methods for the Solution of (3.4)


In this section, we comment briefly on the use of the Jacobi {6.2. 7), Gauss-
Seidel (6.2.10), accelerated Gauss-Seidel (6.4.3), semi-accelerated Gauss-
Seidel (6.4.2), and other methods of Chapter 6. Assumptions (A1.1) and
(A1.2) will continue to be used.
Given w0 (u), the Jacobi iteration for solving (3.4) is

Wn+l(x, u) = LPe(x, yju)wn(Y, u) + Ce(x, u). (4.1)


y
200 7. The Ergodic Cost Problem

The Gauss-Seidel iteration for solving (3.4) is

Wn+l(x,u) = LPe(x,yiu)wn+l(y,u) + LPe(x,yiu)wn(y,u) + Ce(x,u),


y<x
(4.2)
where it is supposed that the states are ordered in some way. Recall that
'Y(u) = W(xo, u). Even if the transition probabilities for the approximating
Markov chain are selected such that p(x,xiu(x)) = 0 for all x (see the dis-
cussion concerning normalization in Subsection 5.2.2), due to the centering
we will not have Pe(x,xiu) = 0 for any state x for which p(xo,xiu(x)) > 0.
From a numerical point of view, it is preferable to "normalize" the equa-
tions (4.1) and (4.2) by using the forms

Wn+I(x, u) = [LPe(x, yiu)wn(Y, u) + Ce(X, u)] /(1- Pe(x, xlu)), (4.3)


y"#x

Wn+I(x,u) = [ LPe(x,yiu)wn+l(y,u) + LPe(x,yiu)wn(y,u)


y<x y>x

+ Ce(x, u)] /(1- Pe(x, xlu)).


(4.4)
Similar remarks hold for the various "accelerated" forms of these alga-
rithms.
Because the matrix Pe(u) depends on the chosen centering state xo, it
is conceivable that the eigenvalues will depend on x 0 . Lemma 3.2 implies
that this is not the case for the Jacobi method. But the eigenvalues of
the matrices which define the Gauss-Seidel type iterations do depend on
the chosen state, and this dependence can be quite important. The type
of dependence which seems to be typical is presented in Tables 4.1 and
4.2 below for one particular case, which is an approximating chain for a
diffusion process. Define the one dimensional system

dx=cxdt+dw, c>O

on the interval [0, 1], with instantaneously reflecting boundaries. The state
space for the Markov chain approximation is {h, 2h, ... , 1 - h}, where 1/ h
is assumed to be an integer. Let ch :::; ~. and define x 0 = noh for some
integer n0 • Define the transition probabilities ph(x, y) by the "central finite
difference formula" (see Section 5.1)

1 +chx
p(x,x +h)= 2 , x = h, ... , 1- 2h,

1-chx
p(x,x- h)= 2 , x = 2h, ... , 1- h,
7.5 The Control Problem 201

p(h, h) = 1- p(h, 2h), p(1- h, 1 -h) = 1- p(1 - h, 1 - 2h).


In the tables, h = 1/20. The first three lines of Table 4.1 compare the
spectral radius of the iteration matrix with an acceleration parameter w =
1.2 and Xo equaling h (the left hand state), 10h (the center state) and 19h
(the right hand state). The advantages of the AGS over the SAGS over
the GS over the Jacobi are clear. Note the improvement due to the use of
the normalization (although the effects of the normalization will be less in
higher dimensions). The use of the left hand states for the centering state is
preferable. This seems to be due to the fact that the Gauss-Seidel method
starts iterating from the left. A comparison of the first three and the second
three lines in the table and with the unaccelerated GS column show the
advantages of acceleration. Of course, if the acceleration parameter is too
large, the iteration becomes unstable. The choice of the centering state is
important in general. It should be a state which has a "high connectivity"
from other states.
The multigrid method is used exactly as in Chapter 6.

no w J GS GS(norm) SAGS( norm) AGS(norm)


1 1.2 .987 .975 .963 .956 .920
10 1.2 .987 .977 .971 .966 .965
19 1.2 .987 .978 .974 .969 .971

1 1.4 .987 .975 .963 .949 .909


10 1.4 .987 .977 .971 .960 .953
19 1.4 .987 .978 .974 .964 .963

Table 4.1. Spectral radius, c = 1.

no w J GS GS(norm) SAGS( norm) AGS(norm)


1 1.2 .986 .973 .965 .958 .939
10 1.2 .986 .976 .970 .964 .964
19 1.2 .986 .977 .973 .967 .971

Table 4.2. Spectral radius, c = 0.

7.5 The Control Problem for the Approximating


Markov Chain
We will adapt the discussion concerning the problem formulation and nu-
merical algorithms in Sections 7.1 to 7.4 to the approximating Markov
chain. First, the dynamical equation for the value function for a fixed con-
trol, and the dynamic programming equation for the optimal value will be
202 7. The Ergodic Cost Problem

given. Assumptions (ALl) and (A1.2) will be used for the approximating
chains for the values of h of interest. In this section, we let the interpolation
interval 6.th(x, a) depend on the state and control in order to get a better
understanding of the relationship between the approximating chain and the
original diffusion process. It will be seen later that the approximation in
policy space algorithm requires that the intervals not depend on either the
state or control. In Section 7.7, the transition probabilities will be modified
so that they are not state dependent.
Suppose the setup for a reflecting jump diffusion which was given in
Subsection 5.8.3. The process x( ·) is constrained to stay in a compact set
C which is the closure of its interior. Ch denotes the set of states for the
approximating chain in C, and act denotes the reflecting boundary for the
chain, which is disjoint from Ch. Define the state space Sh = Ch U act.
Recall that 6.th(x,a) = 0, for X E aCt. Recall the notation: u~ = u(e~),
Llt~ = Llth (e~, u~), and t~ = L:~,:-01 Llt~.
It will be seen below that the appropriate cost function for the Markov
chain, under an admissible control sequence u = {uo(·), u1(·), ... }, is

"'h(x u) =lim sup


Eu ""'n k(eh,' u'!-)6.t'!- (5.1)
Eu ""'n
x L.Jo ' '
' ' n X LJO 6.t'!-t

There can also be costs associated with being on the reflecting boundary,
and an example will be given in Chapter 8. See also Section 7.8. In this
introductory discussion, we wish to keep the formulation simple. If (5.1)
does not depend on the initial condition, then we write it as 'Yh(u). Suppose
that the feedback control u(·) is used at each time step and let rrh(u) =
{ 1l"h ( x, u)' X E sh} denote the associated invariant measure. The ergodic
theorem for Markov chains [23] yields

h(u) =limE; E~ k(ef, u~)Llt~ = Lx k(x, u(x))Llth(x, u(x))rrh(x, u)


'Y n E; E~ 6.t~ Lx 6.th(x, u(x))rrh(x, u)
(5.2)
There is a vector valued function Wh(u) with values {Wh(x, u), x E Sh},
such that (Wh(x,u),"fh(u)) satisfy

Wh(x, u) = LPh(x, yiu(x))Wh(y, u) + [k(x, u(x))- 'Yh(u)] 6.th(x, u(x)).


y
(5.3)
For x E act, (5.3) reduces to

Wh(x,u) = :~:::.>h(x,yiu(x))Wh(y,u). (5.4)


y

In fact, analogously to the case in Section 7.1, the function defined by


()()

Wh(x, u) = L E;[k(e~. u~)- 'Yh(u)]Llt~ (5.5)


n=O
7.5 The Control Problem 203

satisfies (5.3). As was the case in Section 7.1, the solution to (5.3) is unique
up to an additive constant on the function Wh(u). Let (Wh,;yh) satisfy

Wh(x) = L:>h(x, ylu(x))Wh(y) + [k(x, u(x))- ;yh] ~th(x, u(x)). (5.6)


y

Then, following the argument used below (1.4), ;yh = rh(u).


We next show why the cost function (5.1) or (5.2) is a natural choice for
the approximating Markov chain model. Fix the feedback control u(·). So
far, we have been working with the invariant measure of the approximating
chain. The amount of time that the chain spends at any one state once
it arrives there might depend on that state (and perhaps on the control
used in that state as well). Thus, the invariant measure of the chain needs
to be adjusted to account for this (possibly) state and control dependent
sojourn time, if it is to be used to get the mean value of the cost per
unit time for the continuous parameter interpolated process 7/Jh (·). That
is, because ~th(x,u(x)) depends (or might depend) on the state x, and on
the control used at that state, the invariant measure for the chain needs to
be ''weighted" in order for it to account for the time that the interpolated
process (which is the process of primary interest) spends at each state x.
Define the measure J..Lh(u) = {JLh(x,u),x E Sh} by

(5.7)

Note that the value is zero for the "instantaneous" reflecting states. Now,
(5.2) can be written in the simpler form

(5.8)
X

Recall the definition of the interpolation ~h(·) from Chapter 4, and define
the interpolation uh(·) here by: uh(t) = u~ fortE [t~,t~+l). The ergodic
theorem for Markov chains also implies the following identities (all with
probability one)

(5.9)
204 7. The Ergodic Cost Problem

A similar expression can be obtained for 1/Jh(·).


The expressions (5.9) make it clear that the cost function (5.1) or (5.2)
is the correct one for the approximating Markov chain model. The scal-
ing by the mean interpolation interval [in the denominators of (5.1) and
(5.2)) accounts for the (possibly) state and control dependent time that the
process spends at each state until the next transition. It also transforms
[as in (5.7)) the invariant measure for the chain into one which measures
the relative time that the interpolated process spends at each state of the
interpolated process. It will be seen in Section 7.6 that J.Lh(u) is just the
invariant measure for the continuous parameter Markov chain interpolation
1/Jh(·), which was defined in Chapter 4. Let us extend the measure J.Lh(u)
to a measure on the Borel subsets of the "continuous" state space G of
the original reflected process x(·) by defining J.Lh(A,u) = L:xEAJ.Lh(x,u).
In fact, it will be be shown in Chapter 11 that this extended J.Lh(u) is an
approximation to an invariant measure of x(·).
Generally, it is much harder to calculate the invariant measure J.Lh(u)
than it is to calculate the values of explicit functionals such as

L k(x, u(x))J.Lh(x, u).


X

The numerical methods for the latter calculation converge much faster than
do the numerical methods which might be used for the calculation of the
invariant measure itself. Because of this, even if one is interested in the
invariant measure of x( ·), we might content ourselves with the values of
approximations to "stationary expectations" of a small number of func-
tions.
An interesting alternative for numerically approximating the invariant
measure for certain classes of "heavy traffic" problems is the QNET method
of [33, 35, 36, 65], although it is not applicable to state dependent or control
problems.

The Dynamic Programming Equation. Define ;yh = infu -yh(u), where


the infimum is over all feedback controls u(·). Then the dynamic program-
ming equation is

The fact that the infimum ;yh of the costs (together with an auxiliary
function Vh) satisfies (5.10) as well as the convergence of the approximation
in policy space algorithm can be shown by the methods used in Theorems
1.2 and 3.1. Here we will show only that (5.10) is the correct dynamic
programming equation. Suppose that there are Vh and ih satisfying
7.5 The Control Problem 205

Vh(x) ~ ~jl) [">y>h(x, yla)Vh(y) + [k(x, a) - ih] t>th(x, <>)] (5.11)

for all X E sh. Let the minimum be taken on at u(x) and let u(·) be
another feedback control. Then we claim that ;:yh = 'Yh(u) which equals the
minimum cost. Following the method used to deal with (1.6),

Vh(x) ~ [ ">y>h(x, yiU(x ))Vh(y) + [k(x, U(x)) - l


ih] t.th(x, U(x))

'0 [">y>h(x, vlu(x) )Vh(y) + [k(x, u(x)) - ih] t>th(x, u(x)) ]·

The first equality implies that ;:yh = 'Yh(u), as shown in connection with
(1.4). Next, multiply the left and right sides of the above inequality by
1rh(x,u), sum over x, and use the fact that 1rh(u) is invariant for the tran-
sition matrix ph(u) to get that

X X

But this last inequality implies that 'Yh(u) 2: ;:yh = 'Yh(u), which proves the
claim.

Computation of lh(u). Let u(·) be a feedback control and assume (Al.l)


and (A1.2). Choose a centering state x 0 , and write (analogously to (3.4))
for X E G h u act
Ch(x,u(x)) = k(x,u(x))Llth(x,u(x)),
peh(u) = ph(u)- ePh(xo, u), (5.12)
C~(u) = Ch(u)- eCh(xo, u(xo)),
wh(u) = P:(u)wh(u) + C~(u). (5.12)
Then
ph(xo, u)wh(u) + Ch(xo, u(xo))
L k(x, u(x))Llth(x, u(x))1rh(x, u). (5.14)
X

Now repeat (5.12) for Ch(x, u(x)) = flth(x, u(x)) yielding the new value
(5.15)
X
206 7. The Ergodic Cost Problem

Divide (5.14) by (5.15) to get 'Yh (u). So we can calculate the cost under
each control u(·). Nevertheless, to use the approximation in policy space
method, we need to use a constant interpolation interval, for x E Gh. See
Section 7. 7.

7.6 The Continuous Parameter Markov Chain


Interpolation
Recall the continuous parameter approximating Markov chain 1/Jh ( ·) intro-
duced in Section 4.3. Let u(·) be a feedback control (as usual, it does not
depend on time). By the law of large numbers for continuous parameter
Markov chains [23], one can show that the J.Lh(u) defined by (5.7) is an in-
variant measure for 1/Jh(·), if nh(u) is an invariant measure for{~~' n < oo }.
In order to prove this assertion, first recall the definitions in Section 4.3 of
T~, the times of change of 1/Jh ( ·), and of the differences L\ T~ = T~+ 1 - T~.
Recall that the mean holding time in state x for 1/Jh (-) is D.th (x, u( x)). The
mean amount of time that 1/Jh(·) takes the value xis {where the limits are
with probability one)
"n-1 L\ hJ
lim Lli=O Ti {~?=x} = D.th(x, u(x))nh(x, u)
(6.1)
n T~ LyD.th(y,u(y))nh(y,u)

which is just J.Lh(x,u). Hence, J.Lh(·) is an invariant measure for '1/Jh(·) under
control u(·). It is unique if there is a unique invariant measure for{~~' n <
oo} under u(·).
The mean cost per unit time for '1/Jh(·) under u(·) can be written as

limE;~
t
t
t lo
k('lj;h(s),u('lj;h(s)))ds = jk(x,u(x))J.Lh(dx,u) = 'Yh(u). (6.2)

We can also write

(6.3)

where E;h(u) is the expectation for the stationary process, under the control
u(·). Again, we note that boundary costs can be added. See Section 7.8 and
Chapter 8. Let us note the following without proof, because it partially
justifies considering the process '1/Jh(·): The equivalence of (6.2) and (6.3)
and a weak convergence argument can be used to show that the weak limits
of the invariant measures J.Lh(u) are invariant measures for the original
process x(·). See Chapter 11, where convergence results for the ergodic
cost problem will be given.
The interpolation '1/Jh (·) is useful for the mathematical convergence anal-
ysis. But the computational algorithm is the same as that for the discrete
7.7 Computations 207

parameter chain. The recursive equations for the average cost per unit time
under any given feedback control u( ·) are the same for the discrete and the
continuous parameter chains. By comparing (6.2} and (6.3) with (5.1) and
(5.2), it is evident that the minimum of (5.1) for the discrete parameter
chain equals the minimum of

limsup.E;-
t
11t
t 0
k(,Ph(s),u(s))ds

over the admissible controls.

7. 7 Computations for the Approximating Markov


Chain
There are two minor issues that need to be dealt with in order to get
a centered equation (with the desired contraction properties) for the cost,
analogous to (3.4). The first problem concerns the possible state and control
dependence of the interpolation intervals. The second issue concerns the
fact that the 'Yh(u) in (5.3) is multiplied by the interpolation interval. With
these problems taken care of, we will get a centered equation of the form
(3.4) and the numerical methods of Chapter 6 can be used.

7. 7.1 Constant interpolation intervals


The Necessity of a Constant Interpolation Interval. A representa-
tion of the mean cost per unit time by a formula analogous to (3.4) as
well as the algorithms in Section 7.4 cannot be used if the interpolation
intervals are dependent on the state or control, because then the e would
need to be replaced with a vector with components Llth(x, u(x)).
Furthermore, if the interpolation intervals are not independent of the
state and the control for each value of h of interest, then the solution 'Yh(u)
to (5.3) is the ratio of expectations given by (5.2), which suggests that it
would be difficult to solve (5.3) by a "dynamical procedure" say, akin to
the Gauss-Seidel method. Indeed, it was shown at the end of Section 7.5
that the mean cost could be calculated, but by solving two cost equations.
In order to solve for the optimal ergodic cost for the approximating chain
by the computational methods of Chapter 6, it seems to be necessary at
this time to have a constant interpolation interval flth(x, a) = Llth, for
X EGh.
For purposes of completeness, we will next discuss an example of the con-
struction of a locally consistent approximating chain whose interpolation
intervals do not depend on either the control or the state. The example
to follow concerns the approximation in the set G. Because the reflection
208 7. The Ergodic Cost Problem

states are instantaneous and their interpolation intervals are zero, they will
be (easily) dealt with separately in Section 7.7.2 below.

Construction of an Approximating Chain with a Constant Inter-


polation Interval. Example. It was seen in Section 5.2 how to construct
approximating chains with interpolation intervals which do not depend on
the state and control. In order to refresh our memory and to firmly associate
the procedure with the computational methods for the ergodic problem, we
will repeat the details for one method and one simple example here. For
concreteness, we work with the system

X2dt,
(7.1)
(a1x1 + a2x2)dt + udt + adw,
where a 2 ;::: 1, lu(t)l ~ 1. Let ai ~ 0, so that the system (7.1) is stable. For
this illustrative example, the state space for x( ·) will be the closed "square"
G = [-B, B] x [-B, B], for some B > 0, and with a reflecting boundary.
It is sufficient for our purposes to let Gh be the regular h-grid (lR'h n G)
on G. Let 8Gt, the reflecting boundary for the chain, be the points in
lR'h- G which are at most a distance h from G in any coordinate direction.
The procedure is as follows: First, we obtain any locally consistent chain,
using any of the methods of Chapter 5. If the interpolation intervals are
not constant, we then use the ideas of Section 5.2 to eliminate the state
and control dependence.
For the first step for this case, and in order to illustrate one concrete
example, let us follow the "operator splitting" procedure outlined in Sub-
section 5.2.3. Write x = (x 1 ,x2 ). The differential operator of the process
defined by (7.1) is

a 2 82 8 8 8
-2 8 2 + x2-
X2 8 Xt
8 X2 + (a1x1 + a2x2)-
+ u(x)- 8 X2 . (7.2)

Define

Here the normalization depends only on x, though in general we consider


normalizations of the form Qh(x, a). For use below, define the maximum
of Qh(x)

The transition probabilities will be defined by the finite difference method.


A central difference approximation (5.3.4) will be used for the second
derivative term and also for the first partial derivative involving the con-
trol term u(x) in (7.2), as in (5.1.18). A one sided difference approximation
(5.3.5) is used for the other two first derivative terms in (7.2). This proce-
7.7 Computations 209

dure yields the transition probabilities

h( hi ) _ nh(x,x ± e1hia)
p x,x±e1 a - Qh(x) ,

where

and
h( ± hi ) _ nh(x,x ± e2hia)
p X, X e2 a - Qh (X) ,
where
nh(x, x ± e2hia) = a 2/2 + ha/2 + h(a1x1 + a2x2)±,
h h2
/).t (x) = Qh(x).
The transition probabilities from x to all nonlisted y are zero. The above
constructed transition probabilities are locally consistent with the diffusion
model (7.1).
Next, let us construct transition probabilities where the interpolation
intervals are constant. This is easily done as follows, using the ph(x, yia)
given above. Define

-h(
p x, x
± e2 hi a ) = nh(x,xQh
± e2hia) , (7.3)

-h Qh(x)
p (x,xia) = 1- ---zih'

i:h h h2
/).t (x) = /).t = Qh.
The chain with these transition probabilities and interpolation interval is
also locally consistent with the diffusion (7.1). The transition probabilities
(7.3) are used for x E Gh. The transition probabilities for the reflecting
states aat are assumed to be locally consistent with the reflection direc-
tions as in Section 5. 7.

7. 7.2 The equation for the cost {5.3) in centered form


Suppose that we are given a locally consistent chain with interpolation
intervals constant for x E Gh· This might be obtained as above from a
locally consistent chain with a non constant interpolation interval for x E
G h. Because /).th (X) = 0 for X E aat,
the interpolation intervals are not
210 7. The Ergodic Cost Problem

actually constant over all states. Because of this, we need to eliminate


the reflection states. This is easy to do and the algebraic details will be
given below. Once these reflecting states are eliminated, the analogue of
the centered form (3.4) for (5.3) can be used and solved with any of the
numerical methods alluded to in Section 7.4. Let ph(x, yjo:), x E Ch, denote
transition probabilities of a locally consistent Markov chain with constant
interpolation intervals l:l.th for x E Ch. Let ph(x, y) denote the transition
probabilities for X E act. Then (7.8) below is the appropriate centered
form of (5.3), which is analogous to (3.4).
Since the interpolation interval is zero for the reflecting states, we need
to eliminate them in order to get an analogue of the centered form (3.4)
under ph(x, yjo:). Fix the centering state x 0 to be an ergodic state not on the
boundary act. Suppose, without loss of generality, that ph(xo, yjo:) = 0
y act
for all states E and control actions o:. Also, to slightly simplify the
development, suppose that the states in act communicate only to states
inCh.
In order to eliminate the instantaneous reflecting boundary states, we
define the transition probability for x, y E Ch as

jjh(x, yjo:) = ph(x, yjo:) + L ph(x, zjo:)ph(z, y).


zE&Gt

Now, for a feedback control u(·), define the transition matrix for the "re-
duced" chain on Ch by

Then (5.3) can be rewritten as

where
Ch(x,u(x)) = k(x,u(x))l:l.th, x E Ch,
Wh(u) = {Wh(x,u), x E Ch},
since we supposed that there is no cost on the boundary. Now follow the
procedure which led to (3.4). Choose the centering value Wh(xo, u) to
satisfy
h( ) Wh(xo,u)
'Y u = fl.th . (7.4)

Define the centered values

Then
7.7 Computations 211

Wh(x 0 , u) = f>h(x 0 , u)wh(u) + kh(x 0 , u(x0 ))ath. (7.5b)


Using the fact that ph(x0 , yju(x0 )) = ph(x0 , yju(x0 )), we thus obtain the
following form of (5.3)

wh(x, u) = L [Ph(x, yju(x))- ph(xo, yju(xo))]wh(y, u)


yEGh (7.6)
+ [k(x,u(x))- k(x0 ,u(x0 ))]ath,
for x E Gh.
It is not necessary to calculate the ph(x, yja) in order to evaluate the
right hand side. The computation can be done in terms of the original
ph(x, yja). First, rewrite the sum in (7.6) as

L [Ph(x, yju(x))- ph(xo, yju(xo))] wh(y, u)


yEGh
+ L L ph(x, zju(x))ph(z, y)wh(y, u).
yEGh zElJGt

For X E act, define wh(x,u) by


wh(x,u) = LPh(x,y)wh(y,u). (7.7)
y

Then, for x E Gh, we can write (7.6) as

wh(x, u) = L[Ph(x, yju(x))- ph(xo, yju(xo))]wh(y, u)


y (7.8)
+ [k(x,u(x))- k(x0 ,u(x0 ))]ath.
Equation (3.4) is thus replaced by (7.4), (7.7), (7.8).

The Algorithm in Terms of the Original Data. Before proceeding, let


us backtrack a little. Often the ph(x, yja), x E Gh, are obtained by starting
with a chain with non constant interpolation intervals, and then doing a
transformation, as in the last subsection. Since it is useful to write the ac-
tual computational algorithm in terms of the original data, let us suppose
next that we start with locally consistent transition probabilities ph(x, yia)
for which the interpolation intervals (for x E Gh) might not be constant,
and then obtain locally consistent transition probabilities ph(x, yja) with
constant interpolation intervals from them as outlined in the above sub-
section. Below, it will be useful to represent the original ~(x, yja) in the
common form
h nh(x,yja)
p (x, yja) = Qh(x, a) , x E Gh.

We will write the algorithm in terms of the ph(x, yja), since that is the
starting data. Equation (7.8) will be rewritten in terms of ph(x, yia) to get
the final result {7.11) and (7.12).
212 7. The Ergodic Cost Problem

In terms of the original transition probabilities V"(x, yJa), we can rewrite


(7.8) as

Ly# [nh(x, yJu(x))- nh(xo, yJu(x 0))] wh~\,u)


+ [1 _ Qh~~,a) _ nh(xoQku(xo))] wh(x, u) (7.9)

+ [k(x,u(x))-k(xo,u(xo))] ~.
where Qh = SUPx,a Qh(x, a). As will be seen below, it is not necessary to
calculate Qh.

The Normalized Equation. Recall the definition of the normalized equa-


tion in Section 5.2. The normalized form of (3.4) is

Ly'lx Pe(X, yJu)w(y, u) + Ce(x, u) (7.10)


w(x,u)= I)
1 - Pe (x,xu .
It is generally preferred to use the normalized form in the Gauss-Seidel
or related relaxations. Let ph(x, yJa) be the locally consistent transition
probability with constant interpolation interval for x E Gh used above.
Suppose that a procedure such as in the last subsection is used to get them
from a locally consistent set ph(x, yJa) for which the interpolation intervals
are not constant in Gh. Then ph(x, xJa) > 0 for some states x E Gh.
The normalized form of (7.7), (7.8) for this case will now be written in
terms of the original ph(x, yJa). Because one generally tries to construct
the transition probabilities such that ph(x, xJa) = 0, let us assume this
here. Using the normalization analogous to (7.10) and noting that

yields for x E Gh,

h( ) ""nh(x,yJu(x))- nh(xo,yJu(xo)) h( )
w x, u = ~ Qh(x, u(x)) + nh(xo, xJu(xo)) w y, u
y# (7.11)
h2
+ [k(x,u(x))- k(xo,u(xo))] I (Xo ))"
Qh( X,U (X)) +nh( Xo,XU

For the reflecting states,

wh(x,u) = LPh(x,y)wh(y,u). (7.12)


y

Recall that the actual average cost is given by (7.4).

Approximation in Policy Space. For the approximation in policy space


method, given a current control u( ·) and an associated approximation
7.8 Boundary Costs and Controls 213

wh(u), the next control u 1 (-) are the minimizers in

~~ [~fi"(x, vln)w'(y, u) + k(x, n)t.t•] , x E G,. (7.13)

7.8 Boundary Costs and Controls


In the last two sections, the cost associated with being on the bound-
ary was identically zero and no control was allowed there. In many ex-
amples, one needs either or both boundary cost and control (Chapter 8,
[86, 106, 107, 153]). Here, we note only that for many routing problems
in telecommunication, control is on the "boundary" because that corre-
sponds to additional resources (say, a circuit) being unavailable. Only a
few "algorithm oriented" comments will be made here, with the interpo-
lation intervals ~th in C being constant. Let the running cost k(x, a) be
zero on act' and define the boundary cost rate ko(x, a), a continuous real
valued function which is zero for x E Ch. The typical cost that is accrued
on the boundary is ofthe form ko(x, a)h and we will use this. For examples,
see the above cited references.
Now (5.1) is replaced by

r E; L~=O k(~f,uf)~tf + E; L~=O ko(~f,uf)h (8.1)


Im sup Eu "'n " h
n x L-0 L.lti

Equation (5.3) continues to hold for X E ch. Let JJ'l(x, yia), X E act'
denote the controlled transition probabilities on the reflecting states. For
X E act, (5.4) is replaced by

Wh(x,u) = LPh(x,yiu(x))Wh(y,u) + ko(x,u(x))h. {8.2)


y

The function defined by {5.5) continues to satisfy (5.3) and {8.2) if the cost
k(e~, u~)~t~ is replaced by
k(e~. u~)~t~ + ko(e~. u~)h.
Now let use the terminology of Section 7.7 above. Equations (7.8), (7.9)
and (7.11) continue to hold for x E Ch. Recall that xo does not communi-
cate with states on act by assumption. For X E act, (7.12) is replaced
by
wh(x, u) = LPh(x, ylu(x))wh(y, u) + ko(x, u(x))h. (8.3)
y

In fact, suppose that (7.8) and (8.3) hold. Recall that by (7.5b), and
USing the fact that Xo does not communicate to act,

Wh(xo, u) = LPh(xo, yiu(xo))wh(y, u) + k(xo, u(xo))~th.


y
214 7. The Ergodic Cost Problem

Let 1fh (u) = { 1fh (X, u), X E Gh u act} be the invariant measure for the
chain {~~, n < oo} on the extended state space. Multiplying (7.8) and (8.3)
by nh (x, u) appropriately, adding, and using the invariance of nh (u) as in
Section 7.5 yields

Wh(xo,u)
_ l:xEGh nh(x,u)k(x,u(x))f:!.th + l:xEaG~ nh(x,u)ko(x,u(x))h
- l:xEGh nh(x, u)
(8.5)
Because l:xEGh nh(x, u)f:!.th equals the mean interpolation interval in G,
which is just f:!.th, the cost (8.1) equals Wh(x 0,u)jf:!.th, as previously.

Approximation in Policy Space. The approximation in policy update


uses (7.13) for x E Gh and the minimizing controls in

~ [~>h(x, yia)wh(y, u) + ko(x, a)h l,


X E act. (8.6)

Similar considerations hold for other cost functionals, when there is


boundary cost and control and a reflecting process.
8
Heavy Traffic and Singular Control
Problems: Examples and Markov
Chain Approximations

Many of the process models which are used for purposes of analysis or con-
trol are approximations to the true physical model. Perhaps the dimension
of the actual physical model is very high, or it might be difficult to define a
manageable controlled dynamical (Markov) system model which describes
well the quantities of basic interest. Sometimes the sheer size of the problem
and the nature of the interactions of the component effects allows a good
approximation to be made, in the sense that some form of the central limit
theorem might be used to "summarize" or "aggregate" many of the ran-
dom influences and provide a good description of the quantities of interest.
Because these simpler or aggregate models will be used in an optimiza-
tion problem, we need to be sure that optimal or nearly optimal controls
(and the minimum value function, respectively) for the aggregated prob-
lem will also be nearly optimal for the actual physical problem (and a good
approximation to the associated minimum value function, respectively).
This chapter is concerned with two classes of problems where this av-
eraging effect can be used to simplify the model. The first class, the so-
called class of heavy tmffic problems, originated in the study of uncontrolled
queueing systems [74, 113] and has applications to a broad class of such
systems which include certain communication and computer networks and
manufacturing systems. For these systems, "traffic" is heavy in the sense
that at some processors there is little idle time. The distributions of the
service and interarrival times might be dependent on the system state. The
dimension of the physical problem is usually enormous. With an appro-
priate scaling, a functional central limit theorem argument can be used to
show that the basic elements of the system can be well approximated by
216 8. Heavy Traffic and Singular Control

the solution to a certain reflected diffusion process. The approximation also


allows us to compute good approximations to optimal controls and optimal
value functions for the original physical system. A full development of the
subject is in [100]. Owing to the fact that the limits are for multiple time
scale problems, related applications appear in singular perturbations [94]
The second class of problems to which this chapter is devoted (and which
includes part of the first class) is the class of so-called singular control prob-
lems. Perhaps, they are best understood as being limits of either a sequence
of discrete time problems, or as approximations to controlled continuous
time systems, where the control is well approximated by a sequence of (pos-
sibly) small impulses but the control "rate" might not be representable as
a bounded function. Loosely speaking, the cumulative control effort can be
represented as a nondecreasing process, but not necessarily as an integral
of a "control rate."
The interest in these classes of stochastic control problems has been
increasing rapidly, because they can be used to model many systems which
occur in manufacturing, communication, modeling of financial transactions,
and elsewhere. Numerical procedures are needed. The models have only
been studied in recent years and limited analytical results are available,
but the Markov chain approximation method is easily adapted and provides
useful procedures. Due to the relative newness of the models, and to the
fact that they often cannot be well understood without an understanding
of the underlying physical processes, in Section 8.1 we will give several
motivating examples. Because our interest is only in providing motivation
for these controlled system models, only a brief and formal discussion will
be given. The examples in Section 8.1 deal mainly with problems where
the actual model which is used for numerical purposes is obtained as an
"aggregation" of or other approximation to a complex physical problem.
In Section 8.2, Markov chain approximations for the "heavy traffic" case
will be discussed. These are actually special cases of those in Section 5. 7 for
the reflected controlled diffusion, but it is useful to show how they specialize
to the cases at hand. Section 8.3 deals with Markov chain approximations
for the singular control problem.

8.1 Motivating Examples


8.1.1 Example 1. A simple queueing problem
In order to illustrate the size of the physical state space of the problem
which is of interest, let us consider the most classical of the models of
queueing theory, a single queue where the interarrival and service times
are all mutually independent and exponentially distributed, and only one
customer can arrive or be served at a time. In this first example, there is no
control [the so-called M/M/1 queue]. Let the buffer size beN; i.e., if there
8.1 Motivating Examples 217

are N customers in or waiting for service and a customer arrives, then an


arriving customer is rejected from the system. Let Aa and Ad denote the
arrival rate and service rate, respectively, and let 7ri(t) denote the proba-
bility that there are i customers in the system at time t. Then it is well
known that the 7ri(·) satisfy the differential equations [84]

Ad7rl - Aa7ro,
.Xa7ri-l + Ad7rHI - (.Xa + .Xd)7ri, i =f. 0, N, (1.1)
Aa1rN -1 - Ad1rN ·

The system (1.1) is one of the few queueing system equations that can be
solved. Nevertheless, it is still quite difficult to calculate the distributions
at finite times for large N. The situation is considerably worse if we allow
the distributions of the service or interarrival times to be other than expo-
nential (even if we are only concerned with the stationary distributions),
or if services or arrivals can occur in batches. The server can break down or
be otherwise unavailable at random. If the arrival or service rates depend
on the state of the system (e.g., faster service or slower arrivals for longer
queues), then the analytic solution of the counterpart of (1.1) can be ob-
tained at present only in a few special cases, and even numerical solutions
are generally difficult to get if N is not small. The required computation
rapidly gets out of bounds if the system is a network of interconnected
queues (except for the stationary solutions to the so-called Jackson cases).
If a control is added to (1.1) (e.g., a control on the service rate), then the
resulting control problem has a very high dimensional state space, even
under the "exponential" distribution. One is strongly tempted to use some
sort of approximation method to simplify the model for the queueing pro-
cess. Such approximations normally require that "certain parameters" be
either small or large. It was recognized in the late 1960's [74, 113] that if the
service and arrival rates are close to each other, then by a suitable scaling,
such a simplifying approximation can be obtained, at least for simple sys-
tems, and the approximating process was a reflected Wiener process with
drift. Subsequently, the same type of result was shown to be true in a fairly
general setting [129] and for controlled problems as well [107, 118]. See [100]
for a comprehensive development and many examples. The "aggregated" or
limit models often yield excellent approximations for the physical systems
under realistic conditions [34, 36, 68, 130]. A further motivation for seeking
aggregative or simplified models is that some sort of stochastic evolution
equation and a Markov model are very helpful if the control problem is to
be treated.

8.1.2 Example 2. A heuristic limit for Example 1


We now continue with the classical example defined above (1.1), and for-
mally discuss how we might get a simple approximating diffusion process.
218 8. Heavy Traffic and Singular Control

Let us work in discrete time and suppose that only one arrival or depar-
ture event can occur at each discrete time. Since we are concerned with
an approximation result, we will consider a family of queueing problems
parameterized by e, with the probability of an arrival at any time and the
probability of completion of any service, conditioned on all past data, be-
ing A+ bav'f. and A+ bdv'f., respectively, where the ba and bd can be either
positive or negative. With these values, the arrival and service rates are
within 0( v'f.) of one another. Marginal differences of the order of v'f. in the
rates can make a considerable difference in the queue occupancy statistics.
As the traffic intensity increases (i.e., as e ~ 0), the mean occupancies
increase, and it makes sense to scale the buffer size also. We let the buffer
size be scaled as Buffer Size = B / v'f., for some B > 0. If the buffer is of
smaller order, then the limit is concentrated at zero, and if it is of larger
order, it plays no role in the limit.
A simplifying convention concerning departures. Let Q~ denote the num-
ber of customers waiting for or in service at discrete time n. There is a
convention which is used in writing the evolution equations which simpli-
fies the analysis considerably. In our current example, it is clear that the
arrival process is independent of the queue size and departure process, but
the departure process does depend on the state of the queue. In particular,
if the queue is empty then there can be no departure. The convention to be
used in writing the evolution equation is that even if the queue is empty,
the processor will keep working and sending out outputs at the usual rate.
But to keep the equations correct, a correction or "reflection" term (the
dY terms below) will be subtracted from the departure process whenever
such a "fictitious" output occurs. This device was used by [69, 74, 129] and
others. While it is not needed [100, Chapter 5], it makes our discussion of
the motivational example a little simpler. If there is an arrival in the midst
of such a "fictitious" interval, it is then supposed that the service time of
this arrival is just the residual service time for the current service interval.
This convention does not affect the form of the limit [74, 100, 107].

Input-Output Equation for the Queue; A Stochastic Evolution


Equation. We can now write the mass balance or input-output equation
as
n-1 n-1 n-1 n-1
Q~ = Qij + L AA:n - L AD:n + L I~£ - L I:::\ (1.2)
m=O m=O m=O m=O

where the d~ and 6.D~, respectively, are the number (zero or one) of
arrivals or departures, respectively, at time m. Keeping in mind the above
convention concerning departures and fictitious departures, the dD~ are
the indicators of departure events assuming that the queue is never empty.
The I~£ corrects for a "fictitious" output at time m if the queue is empty
but a ''fictitious" output occurs at that time. The term I:;( subtracts any
input which arrives at time m if the buffer is full at that time.
8.1 Motivating Examples 219

An Approximating Process. Equation (1.2) is an evolution equation for


the physical system. We would like to approximate it by simpler process.
Equation (1.2) will next be rearranged in a way that will suggest the sort of
approximating limit process that can be expected. Write LlA~ and LlD~
as a sum of a mean value and a random difference

(A+ ba Jf) + ~~e,


(1.3)
(A+ bdv'f) +~~f.
For this introductory discussion, the random variables {1,::(, /~(, m, a, d}
are assumed to be mutually independent with mean zero. For use below,
note that they can also be defined by

'Yma,e = LlAfm - E[LlAfm lilA~' ' LlD~"' i < m- 1] (1.4)


-

and similarly for ~~f. Thus, the partial sums of the { r::t, m < oo} form a
martingale sequence for a equal to a or d. Note that

Elf~fl2 A(l -A) + 0( y'f),


El1~fl 2 = A(l -A) + 0( v'f).

Let [t/t:] denote the integer part of tf«:. Define the continuous parameter
scaled queue length process Xf(·) by

Xf(t) = v'fQ[t/E]•

Define LlY~ = .fi-J~E and LlUin = .fi-I:r(. Then, letting the ratio t/«:
henceforth denote the integer part only, we can write

t/E-1 t/E-1

XE(t) XE(O) + (ba - bd)t + Vf L ~~f - Vf L ~~f


m=O m=O (1.5)
t/E-1 t/E-1

+ L LlY~- L LlUin + "small error" .


m=O m=O

For motivational purposes, note first that the first two sums in (1.5)
tend to normally distributed random variables with mean zero and variance
A(l- A)t as«:--+ 0. More generally, when considered as functions oft, they
converge weakly to mutually independent Wiener processes wa (-), wd( ·),
each with variance A(l- A)t. If Xf(O) converges weakly (i.e., in distribution;
see Chapter 9 for the definitions) to a random variable X(O) as f--+ 0, then
the sequence of processes defined in (1.5) converges weakly to limits that
satisfy

X(t) = X(O) + (ba- bd)t + wa(t) - wd(t) + Y(t)- U(t). (1.6)


220 8. Heavy Traffic and Singular Control

TheY(-) is nondecreasing and can increase only when X(t) = 0. It is this


reflection term which keeps the queue from going negative. The term U(·)
is nondecreasing and can increase only when X (t) = B, and it keeps the
buffer from exceeding the normalized limit B. Thus for small E, the queue-
ing process can be well represented by a rather simple reflected Brownian
motion process with drift ba - bd.
Note that (1.6) and the other models of this section are special cases of
the Skorokhod Problem formulation of the reflected diffusion.
For small E, many functionals of interest for the physical process xe (·)
can be well approximated by the same functionals of the simpler limit de-
fined by (1.6). For example, the distribution of xe(t) and ue(t) can be well
approximated by those of X (t) and U (t), respectively. Many functionals of
the entire paths can also be approximated. For example, the distribution
of the first hitting time of some level B1 :<::; B by xe(-) can be approxi-
mated by the distribution of the first hitting time of that level by X (·).
For another example, y't times the number of customers lost by time t/E
is approximated by U (t).

A Controlled Version of Example 2. The model {1.6) can also be used


for control purposes. Consider one simple case, where the probabilities of
arrivals or service completions are controlled and we use the "controlled"
rates
A+ ba.ff. + CaUa.fi, A+ bd.jf. + cdud.jf.,
where Ua and ud are the control variables. The actual value of the control at
each time n is to be selected at that time, based only on information avail-
able then, and we assume that the control values Ua and ud are bounded
in absolute value by some given constants Ua, Ud·
Suppose that the cost for the physical system is defined by

E; L e-13nek(XE(nE), u~)€ + E; L e-f3nellu~.


00 00

we(x, u) = {1.7)
n=O n=O

where [3 > 0. This cost weighs the loss of a customer heavily relative to the
cost of control or the cost of the waiting time. Define ve(x) = infu We(x, u)
where the infimum is over all admissible controls. The appropriate con-
trolled form of (1.6) and the associated cost are

X(t) = X(O) + (ba- bd)t +lot (caua(s)- cdud(s))ds {1.8)


+ wa(t)- wd(t) + Y(t)- U(t).

W(x,u) = E; 100
e-13tk(X(t),u(t))dt + E; 100
e-11tdU(t). {1.9)

Let V(x) = infu W(x, u), where the infimum is over all "admissible" con-
trols for {1.8). Loosely speaking, what we mean by admissible (see Chapters
8.1 Motivating Examples 221

1 or 9 for a fuller discussion) is that the controls satisfy IUa (t) I :::; Ua for o:
equal to a or d, and the ua(t) are independent of the future of the Wiener
processes in the sense that Ua(t) is independent of {wa(t+s) -wa(t), Wd(t+
s)- wd(t), s;::: 0}. It can be shown that v~(x)---+ V(x), as E---+ 0, and that
continuous "nearly optimal" controls for the limit problem are also "nearly
optimal" when used on the physical problem. See similar results for related
but more complex problems are in [3, 4, 94, 86, 100, 103, 107, 118].
These representations and approximations (1.8) and (1.9) hold under
quite broad conditions, as can be seen from the references. The interarrival
or service intervals need not be exponentially or geometrically distributed,
and only the first and second moments of their distributions appear in the
limit equation. "Batch" arrivals and services can also be introduced. Our
aim here is to motivate the limit system equations only, and the reader is
referred to the literature for further details. The numerical problem consists
of computing an approximation to the optimal cost V(x) and associated
control (and then applying the obtained control to the physical problem).
Note that the cost structure is different from that used for the problems in
Chapter 3, because the process U(·), which is included in the cost, is not
differentiable.

8.1. 3 Example 3. Control of admission, a singular control


problem
In many problems in queueing and production systems, one seeks to re-
duce the cost associated with the waiting time by controlling the number
of customers entering the system, while suitably penalizing the number de-
nied entry. If a customer is denied entry, then that customer is assumed to
disappear from the system. An alternative model might allow it to reap-
pear later with a given probability. Generally, a higher cost is assigned to
a denial of entry than to "waiting." Example 2 will now be revised to ac-
commodate this type of control. Let D.F:,_ denote the indicator of the event
that an arriving customer or job has been denied entry into the queue. By
convention, we suppose that a customer which arrives when the buffer is
full is denied entry. Then the state equations (1.2) are

n-1 n-1 n-1 n-1 n-1


Q~ = Qo + L: D.A~ - L: D.D~ + L: 1::~ - L: 1:::€ - L: D.fr:n ·
m=O m=O m=O m=O m=O
222 8. Heavy Traffic and Singular Control

Define 6.F:r,_ = v'f6.F:r,_. Then in scaled form, we can write (modulo a


negligible error term) the controlled analogue of (1.5)

t/(-1 t/(-1

XE(O) + (ba- bd)t + v'f L 'Y~E- v'f L -y:!;E


m=O m=O (1.10)
t/E-1 t/(-1 t/E-1

+ L D.v:.- L D.u:n- L D.F:n.


m=O m=O m=O

Let the cost be (1.7) with k16.U:r,_ +k26.F:r,_ replacing D.U:r,. The function

t/E-1
F£(t) = L 6.F!
m=O

represents the scaled cumulative control action. It is implied in [118] that


the sequences defined in (1.10) converge in an appropriate sense to those in
(1.6) {with -F(·) added) and the cost converges to {1.9) with U(·) replaced
by k 1U(-) + k2 F(·}, and the F(·), U(·),X(-), Y(-) are non anticipative with
respect to the Wiener process.
The control term F(t) does not necessarily have a representation as
F(t) = f~ f(s)ds, for some bounded control rate /{·). The paths ofF(·)
are simply nonnegative and nondecreasing right continuous functions. Such
controls are called "singular" or, more accurately, singular with respect
to Lebesgue measure, because they are not necessarily representable as
integrals with respect to Lebesgue measure. In recent years, such con-
trols have appeared more frequently as models of many types of systems
[66, 82, 112, 22, 139, 146). Such a control rate might not be "physical" since
it might be unbounded, but as we just saw, they arise very naturally as
approximations to physical systems.
Under broad conditions, one can show [100, 118) that the optimal value
functions for the physical system and for the limit system are arbitrar-
ily close for small f. The interarrival intervals need not be exponentially
distributed. Also, nearly optimal controls for the limit system are nearly
optimal for the physical system for small f. In particular, the optimal con-
trol for the limit system is often of the form: There is a Bo ~ B such that
F(·) keeps the process in the interval [O,Bo] [e.g., when k(·) is increasing
with x and is continuous]. In this case, the cost under the control which
admits an arriving customer only if X((t) E [0, B 0 } is arbitrarily close to
the optimal cost for the physical system for small f. This might not be
surprising for this simple case, but similar results hold under quite weak
assumptions on the arrival and service statistics and cost structure, and
for multidimensional systems. It is generally much easier to compute the
value of the controls for the limit system than it would be for the actual
physical model.
8.1 Motivating Examples 223

8.1.4 Example 4. A multidimensional queueing or production


system under heavy traffic: No control
We will next describe a multidimensional system of the type of Example
1. For the sake of expository simplicity, we first discuss the problem in the
absence of control, which will be added later. The general discussion will
be loose and confined to a simple case, because our main purpose is the
motivation of the reflected diffusion systems models which arise. Let us
work with a general K -dimensional system. There are K service stations
or processors, each serving one customer at a time, and the ith station is
referred to as Pi. Each station might have a sequence of external inputs and
sends its output either to the exterior of the system or to another processor.
Let Pii denote the probability (conditioned on the "past" data) that the
completed output of~ goes to Pi, and with Pio being the probability that
the output leaves the system. It is assumed that the spectral radius of the
transition matrix P = {Pii, i, j = 1, · · ·, K} is less than unity. This implies
that all customers leave the system eventually. Such systems are called open
systems [67]. We also work in discrete time for notational convenience.
We use Po to denote the exterior, whether it is the source or the desti-
nation of a customer. Let Q~€ denote the total number of customers in or
waiting for service at time n at Pi, i = 1, ... , K. Without loss of general-
ity, let Pii = 0. This can be accomplished by a redefinition of the service
intervals. Let Bi/ JE denote the buffer sizes, where Bi is assumed to be an
integral multiple of JE.

The Input-Output Equations. The mass balance equation for this sys-
tem [analogous to (1.2)] is
n-1
Q~€ = Q~€ + L (arrivals to ~ from the exterior at time m)
m=O
n-1
+L L (arrivals to Pi from Pi at time m)
j#.Om=O
n-1
- L L (departures from Pi to Pi at time m)
j m=O
n-1
+L L (corrections for fictitious departures to Pi from ~ at m)
j m=O
n-1
- L L (corrections for fictitious departures from Pi to Pi at m)
j#.Om=O
n-1
-L (corrections for lost inputs due to a full buffer at Pi at m)
m=O

We continue to use the convention introduced in Example 2, where the


224 8. Heavy Traffic and Singular Control

processors "keep processing" even when there are no customers, and the
fictitious outputs thus created are compensated for by a cancelation or
reflection term, which are the various Y -terms below. Define the scaled
occupancies X~£ = y'EQ~E, and let X~ denote the vector with components
{x~E' i ~ K}. In general, for any sequence {z~' n < 00} define the con-
tinuous time parameter interpolation ZE (.) by Z£ (t) = z~ on the interval
[nf, nf + €). Define ~y~,E = y'f times the indicator of the event that a
fictitious departure occurred at Pi at time m and it was sent to Pi. Let
~A~E and ~D~i,£ be the indicators of the events that there is an external
arrival at Pi and a departure from Pi to Pi at time n, respectively. Let
~U~£ denote the indicator of the event that there is an arrival to Pi at
time n which is rejected due to a full buffer. Rewrite the above equation
with the obvious notation, and where tjf is used to denote the integer part

~E-1 ~E-1 ~£-1

Xi,E(t) = x~·e+..ff L ~A~+..ffL: L ~Di~·E-..jfL L ~D~·£


m=O j~O m=O j m=O
t/E-1 t/E-1 t/E-1
+L L ~Y~·E-L: L ~y~i,£_ L ~U~E·
j m=O j~O m=O m=O
(1.11)
Suppose that more than one arrival to some processor occurs at some
time. If the buffer capacity is not exceeded by these arrivals, then their
order is unimportant. Otherwise, order them in some way and reject those
that arrive when the buffer is full.

The Heavy Traffic and Other Assumptions. In keeping with the in-
tuitive idea of heavy traffic, the average total input rate to each processor
from all sources combined will be close to the service rate of that processor.
AB f ---+ 0, the average input and service rates for each processor converge
to each other. The rates will be introduced via their "inverses," the (con-
ditional) means of the interarrival or service intervals. In particular, we
suppose that:

The expectation of the (n + 1) 8 t interarrival interval (arrivals from the


exterior) and service interval for P;, each conditioned on the ''past" up to
the beginning of the interval in question, take the following forms, respec-
tively
[gai + .;Eai(state of system at start of that interval )t 1
(1.12)
[gdi + ../fdi(state of system at start of that interval )t 1 ,
where the gai are constants and the "marginal rates" ai(·) and di(·) are
bounded and continuous functions of their arguments.

Note the analogue to the form used in Example 2. The gai and gdi are the
8.1 Motivating Examples 225

dominant parts of the rates of arrival (for external arrivals) and service,
respectively, at Pi· We also suppose that the conditional variances of these
random intervals, given the "past data," are also continuous functions of
the current state, modulo an error which goes to zero as f-+ 0. The mathe-
matical proof of convergence also needs the additional assumption that the
set of the squares of these intervals is uniformly integrable inn, f [100).
In order to operate in the heavy traffic environment, it is necessary that
the dominant parts of the mean arrival and service rates be equal for each
processor. This implies that

gai + L Pjigdj = gdi· (1.13)


#0

If (1.13) does not hold, then the scaled queue length process at some pro-
cessor will always be either near zero or at the upper limit for small f. The
relations (1.12) and (1.13) are known as the heavy traffic assumption.

Simplification of the Evolution Equation (1.11). Define


t/£-1
yij,£(t) = L !:l.Y/j·£
m=O

and
yi,£(t) = l:yij,£(t).
#0
Then it can be shown [100, 107, 118) that

yii,£(t) = Piiyi,£(t) + asymptotically negligible error.


In (1.4) of Example 1, we split the indicator of the event that there is
an external arrival at time n into the sum of the conditional mean value
>. + ba..fi (which was also the unconditional mean value in that case) and
the random "martingale" difference 1'!~(, and similarly for the indicators of
the service completions. An analogous approach can be taken in the current
case. The sums of the analogous "martingale differences" lead to theM£(·)
term in (1.14) below. Details are in the references. Doing this splitting and
using (1.12) and (1.13) yields that the mass balance equation (1.11) can be
written in the vector form

(1.14)
+ "small error terms".
The ith component of H£ (t) has the form
226 8. Heavy Traffic and Singular Control

where bi(-) = ai(·)- di(-) + Lj Piidj(·). TheM'(·) is a martingale with a


quadratic variation of the form J~ E(X'( s ))ds, where the bounded and con-
tinuous matrix valued function E( ·) can be calculated from the conditional
means and covariances of the external interarrival and service intervals
and the Pii, as shown in the references. The ith component of the reflection
terms Y'(·) and U'(·) are nonnegative, nondecreasing, and can increase
only when the state Xi•'(t) takes the value zero or Bi, respectively.
Suppose that X'(O) converges to a random variable X(O). Then, under
suitable conditions on the bi ( ·) and E( ·), the sequence of processes defined
in {1.14) converges to a limit which satisfies

X(t) = X(O) + H(t) + M(t) +(I- P')Y(t)- U(t), (1.15)

where

M(·) is a stochastic integral with respect to some Wiener process w(·), and
can be written in the form

M(t) =lot E 1 (X(s))dw(s),


1 2

and the reflection terms Y (·) and U (·) have the properties described above
in connection with (1.14). Also X(·), Y(·) and U(·) are nonanticipative with
respect tow(·). Equation (1.15) describes the limit process which we wished
to motivate. Note that it is in the form of the Skorokhod Problem. N umer-
ical methods for such systems and their controlled forms are of increasing
interest. The Markov chain approximation method is easily adapted for use
on such processes.

Remarks on (1.15). See Figure 8.1 for an example of a two dimensional


problem. The directions of reflection are as indicated. At the origin, the set
of allowed reflection directions is the convex cone formed by the directions
at the neighboring points. See the discussion of the Skorokhod Problem in
Chapter 1 and the discussion of the reflecting problem is Section 5. 7. The
directions of reflection on each side of the box can be obtained by noting
that there can be an "increment" in Yi(·) [respectively, in Ui(·)] at timet
only if Xi(t) = 0 [respectively, Xi(t) = Bi]·
8.1 Motivating Examples 227

Figure 8.1. A two dimensional problem.

Adding a Control to (1.11). Suppose that a control of the type used in


(1.7) is added to (1.11). Then the limit dynamical system (1.15) is replaced
by
X(t) = X(O) + H(t) + M(t) +(I- P')Y(t)- U(t), (1.16)
where
Hi(t) =lot bi(X(s), u(s))ds
for appropriate continuous functions bi(-) and a control u(·), and analo-
gously for the singular control problem.

Cost Functions for {1.11) and {1.16). Let G denote the state space
[0, B1] x [0, B2] for (1.16). In order to illustrate the scaling which is needed
for the cost function for the physical system (1.11), we introduce one par-
ticular but interesting case. We now restrict attention to a two dimensional
example for notational simplicity. Let j =f. i, and let k(·) be a continuous
function and ki positive numbers. Define
"Li,€m = u"Ai,€1
u m {x~·=B;} + u"Dii,€1
m {x~·=B;,x~·¥0}' (1.17)

which represents an "overflow" of processor i. For an admissible control


sequence u = {u~, n < oo}, let the cost for the physical system (1.11) be
of the discounted form

L e-m{3€
00

W€(x, u) = E: [Ek(X~, u:n) + .fik1~L~€ + .fik2~L;;€] .


m=O
(1.18)
228 8. Heavy Traffic and Singular Control

The !l.L~€ term in {1.18) penalizes the loss of a customer to processor 1


when the buffer of that processor is full. That lost customer can come from
either an output of processor 2 or from the exterior of the system. Similarly
for !l.L~€·
From (1.18), we can see that the cost assigned to waiting or to the control
itself {the t:k{·) term) is small relative to the cost assigned to lost customers
due to full buffers (the y'f.!l.L terms). The difference in scale is t: vs. y'f.. The
reason is simply that under other seatings either one or both components
of the cost become either negligible or unbounded in the limit as t: -+ 0.
In addition, one might wish to heavily penalize lost customers. The form
{1.18) is only one possibility among many.
If E(-) is never degenerate for x E G, then it can be shown [70],[100,
Chapter 4] that the process
t/<

.ff L !l.D!r!·• 1{x~·=B;} 1{x!r;•=o}


m=O

converges weakly to the zero process. Hence, the limit form of (1.18), which
is the cost for {1.16), is

W(x, u) = E; 1oo e-.Bt[k(X(t), u(t))dt + k1dU1(t) + k2dU 2(t)]. (1.19)

8.1. 5 Example 5. A production system in heavy traffic with


impulsive control
Example 4 covers the case were the marginal arrival or service rates ai ( ·)
and di ( ·) are controlled. Marginal differences in the service or arrival rates
can make a substantial difference in the statistics of the queues, when the
traffic is heavy. We will next discuss a different type of model, where the
limit reflected diffusion model is impulsively controlled. The problem is in-
teresting partly because the nature of the impulsive control is somewhat
nonstandard and is a good example of a new class of problems to which
the numerical methods can be applied. The discussion will be descriptive
only and the reader is referred to [107] for more detail as well as for a
discussion of a numerical algorithm. In this problem there might be "si-
multaneous impulses," but the "order within the simultaneity" is crucial.
By simultaneous impulses, we mean that the control action at some time t
might be a sequence of impulses in different directions taken in a particular
order, with no time gap between them. The possibility of multiple simul-
taneous impulses arises due the scaling which is used as we go to the limit
in the physical model. Events which are separated in time in the physical
model can occur at the same time in the limit due to the way that time is
"squeezed" to get the limit model. But the order in which they occur in the
physical model must be preserved in the order in which we need to take the
8.1 Motivating Examples 229

"simultaneous" impulses in the limit model. This phenomenon affects the


numerical procedure, which must keep track of the correct order. But that
is not a serious problem. The physical model is generally quite hard to work
with, so that the impulsively controlled reflected diffusion approximation
can be quite useful.
The impulsive nature of the control for the limit model is a consequence
of the way the effects of the actual control actions for the physical model
accumulate as the traffic increases. The actual allowable impulses for the
limit model can only be understood in the context of the physical system.
This is the situation for many impulsive and singular control problems,
where the actual model which is used for the calculation or analysis makes
sense only as a limit of a sequence of physical processes as some scaling
parameter goes to its limit.
In the system of concern, the only controls which are allowed are the
actual shutting down of a processor, and the opening or closing of the
links connecting the processors to each other or which connect the external
sources to the processors. To each action, there is an associated immediate
cost as well as a cost due to the actual lost inputs and production. If a link
Pii connecting Pi to Pi is shut down but processor Pi continues to operate,
then the outputs from Pi are assumed to be sent to the outside. They are
lost to the processing system, but there might be a salvage value associated
with them. There will be a cost for these lost customers or for the customers
who cannot enter the system due to full buffers or to processors being shut
down.
Due to the effects of the control actions, the input-output or the mass
balance equation (1.11) needs to be modified by adding or subtracting the
gains or losses which might occur due to the control actions; i.e., the terms

decreases due to lost inputs to I{ when Poi or some Pji, if. 0 is shut off,

increases due to Pi being shut down, but some input is not shut off.
These terms give rise to the impulsive control terms of the limit model.
The directions of the segments of the impulsive controls in the limit model
depend on which combination of links or processors are turned off. See [107]
for more detail.

8.1. 6 Example 6. A two dimensional routing control problem


More interesting versions of the admission control problem of Example 3
can be given when there is more than one processor, so that the internal
routing as well as admissions can be controlled. See [118], where the routing
only is controlled. Refer to Figure 8.2. Processor Po is the routing controller.
It routes the inputs instantaneously to either of the other processors. There
might be external inputs coming directly into processors Pt or P2 also. In
the examples below, it is supposed that some prior preferred routing is
230 8. Heavy Traffic and Singular Control

associated with each of the arrivals to Po, but that this can be changed by
Po with an associated profit or loss. For motivation, consider the following
two particular cases.

Figure 8.2. A routing control problem.


Case 1. There are two classes of customers which arrive at random at
Po (with some prior probability qi that any new arrival will be in class i).
But~. i = 1, 2, is more efficient for the class i. A prior assignment of class
i to Pi is made, but Po can reassign to the other less efficient processor if
the sizes of the waiting lines warrant it. The cost of rerouting might be the
relative cost of the less efficient processor.
Case 2. Continue with Case 1, but let there be three classes of customers
arriving at Po at random. Classes 1 and 2 must be assigned to P1 and P2,
respectively, but there is discretion with Class 3. One of the processors is
more efficient for Class 3, and a prior assignment is made to that processor.
Suppose, for example, that the processors contain data bases with some
overlap in their data set. Class 3 needs only the overlapping data, but
one of the processors is faster, and the prior assignment is made to that
one. The prior assignment can be altered at the discretion of P0 , with an
associated cost.

The Dynamical System and Control Problem. As in Examples 4 and


5, let the buffer size for Pi be Bi/ v'f.. Let D..F:J_•e denote the indicator of
the event that there is an arrival at Po at discrete time m and which has a
prior assignment to Pi, but is reassigned to P1 , where j i= i. Define
n-1
pij,e = If_ "'"" D..ftij,e
n yc ~ m '
m=O

Fni,e = Fji,e _ Fij,e


n n ·
Again, let U~,e denote v'f. times the total number of customers lost to Pi
by time n due to a full buffer. The pij,e(·) represent the control. Let A~i,e
8.1 Motivating Examples 231

denote the number of customers which would be routed to Pi from Po


by time n according to the a priori assignment, and A~E the number of
customers which come directly to Pt. from the exterior by time n. Let D~E
denote the number of (true or fictitious) customers departing Pi by time n.
Recall that, for any sequence Bn, we define B(t) = Bt/E, where t/f. denotes
the integer part. Then, using the other notation of Example 4, the evolution
or mass balance equation which is analogous to (1.14) can be shown to be
(j -:/: i)
xi,E(t) = x~·E + /fAi·E(t) + /fAoi,E(t) _ /fDi·E(t) + pii,E(t)
- pij,E(t) + yi,E(t)- Pjiyi,E(t) - ui,E(t) + "small error."
(1.20)
Let Qi > 0, ki 2: 0, and let k( ·) be continuous. A cost functional which is
analogous to those used in the previous examples is

WE(x,FE) = E;' 1
00
e-Ptk(XE(t))dt

+ E;' 1oo e-IU[qldF12,E(t) + Q2dp21,E(t)] (1.21)

+ E;' 1 00
e-Pt[k1dU 1•E(t) + k2dU 2•E(t)].

Under the heavy traffic assumptions of the type used in Example 4, (1.20)
can be approximated [118] by the "limit system"
Xi(t) = Xi(O) + Hi(t) + Mi(t) + pii(t)- pii (t) + Yi(t)- Piiyi (t)- Ui(t)
(1.22)
where i -:/: j. The limit cost functional is

W(x,F) = E: 1 e-Ptk(X(t))dt
00

+ E: 1 e-Pt (q1dF 12 (t) + q2dF (t))


00
21 (1.23)

+ E: 1 e-Pt (k1dU 1(t) + k2dU 2(t)).


00

Here the H(·) and M(·) terms take the forms of Example 4, with the
appropriate b(·) and E(·) functions. The term M·) is the (scaled) marginal
difference between the input and service rates, and E(·) depends on the
"randomness" of the arrival and service processes. The term

is the control term for the limit system. The pii ( ·) are processes whose
paths are nonnegative, nondecreasing, and right continuous. They are "sin-
gular" controls, and represent the limits of the reassignments. In this prob-
lem, F 1 (-) = -F2 (·), but that is not necessarily the case in general. Thus,
232 8. Heavy Traffic and Singular Control

we have a singular control problem defined in a state space which is the


hyperrectangle B.= {x: 0 ~xi ~ Bi}, and with the type of boundary re-
flection directions of the "heavy traffic" type as in Example 4. The reader
is referred to the references for further detail.

An Extension. For use in Section 8.3, let us write the following K -dimen-
sional extension of the model (1.22) and (1.23)

X(t) = X(O)+ 1t b(X(s))ds+ 1t a(X(s))dw(s)+F(t)+(I-P')Y(t)-U(t),


(1.22')
where for some integer q, F(t) has the representation
q

F(t) = L viFi(t),
i=l

and the Fi(-) are nonnegative, nondecreasing, and nonanticipative pro-


cesses which are right continuous. The cost function is

00
W(x, F)= E: 1 e-r;tk(X(t))dt

+ E{ f e-P< [ ~ q;dF'(t) + ~ k;dU'(t)]·


(1.23')

A Formal Dynamic Programming Equation for the Minimum


Value Function for {1.22'), {1.231). We will next give a formal de-
velopment of the dynamic programming equation for the problem (1.22'),
(1.231). Let V(x) denote the infimum of the value function W(x, F) over all
admissible controls. Let 8 > 0 be small, and let E2 denote the expectation
under zero control and initial condition x. Then, by a formal use of the
principle of optimality, for x E G 0 , we can write

V(x) =min [e-{36 E~[V(X(8)) +k(X(8))8], min(V(x+vi8) +qi8)]. (1.24)


'
See Figure 8.1 for a description of the boundary conditions: On the north
Vx 2 = 0, on the west Vx 1 = 0, on the south -P21 Vx 1 + Vx 2 = 0, and on
the east, Vx 1 - P12 Vx 2 = 0. In (1.24), we suppose that the "approximat-
ing" choices are either to not control over the time interval (0, 8] or else to
instantaneously add an increment 8 to some Fi; i.e., either no control is
used, which leads to the first term inside the outer minimum, or else there
is an increment of size 8 in some pi, which leads to the term in the inner
minimum. The impulsive control term in (1.24) is not discounted because
the control is supposed to act instantaneously. Let £ 0 be the differential op-
erator of the uncontrolled and unreflected diffusion process part of (1.22').
8.1 Motivating Examples 233

Next subtracting V(x) from each side of {1.24) and formally expanding the
terms yields that
.C0 V(x) + k(x)- IJV(x) 2: 0
v:(x)vi + Qi 2: 0, i = 1, · · · ,K
and at each point x, at least one of the K + 1 terms equals zero. Thus, we
formally have

min [.C 0 V(x) + k(x)- fjV(x), min{V:(x)vi


t
+ Qi)] = 0. {1.25)

The reflecting boundary conditions need to be added to {1.25).


Equation {1.25), together with its boundary conditions, is known as a
variational inequality. For the singular control problem, it is the replace-
ment for the PDE's obtained in Chapter 3. See [57, 139] for a more math-
ematical derivation for some related problems (without reflection). The
numerical method based on the Markov chain approximation allows us
to avoid dealing with (1.25), because we approximate the original control
problem (1.22'), {1.23'), rather than the equation {1.25). In Section 8.3, the
dynamic programming equation for the approximating Markov chain will
be given and its formal similarity to (1.25) noted.

8.1. 7 Example 7
An interesting problem in portfolio selection which involves a combination
of singular and ordinary control is in [37]. Let x = (x 0 , xt), where x0 2: 0
is the bank account balance and x1 2: 0 the amount invested in stocks. Let
U(t) [respectively, L(t)] denote the total value of stock sales (respectively,
purchases) by time t, and let c(-) be the "consumption rate." There are
transactions costs for sales (respectively, purchases): one pays a fraction
J.t [respectively, ..\] of the transactions amount. "Infinitesimal" transactions
are allowed and the model is
dxo = (roxo- c)dt- (1 + >.)dL + (1- J.t)dU,
(1.26)
dx1 = r1x1dt + ax1dw + dL- dU,
where ro and r1 are the bank interest rate and the mean rate of increase
of the value of the stocks. The controls are u = {c, L, U).
For a suitable utility function, one wishes to maximize the profit

W(x, u) = E; 100
e-Ptk(c(t))dt. (1.27)

In [37], the form k(c) = d' for >. E (0, 1) was used, and this allowed the
authors to get an (essentially) explicit analytic solution in terms of the
ratio x 0 jx 1 • For other types of utility functions or processes, a numerical
method might be needed.
234 8. Heavy Traffic and Singular Control

Since the state space in (1.26) is unbounded, we might have to bound it


for numerical purposes. This can be done by putting upper limits x0 and
x1 on the bank deposit and stock holding, respectively. The upper bounds
should be large enough so that they do not seriously affect the numerical
results for typical values of the state variables.
The model (1.26), (1.27) in [37] follows a common usage in financial
modeling in that a strictly concave utility function is used. This causes
a difficulty with unbounded consumption "rates" of the type that would
result when upper bounds on the state variable are imposed, because it
would give them zero value. This problem is easy to avoid by defining an
appropriate value for the forced consumption at the upper bound.

Comments. Only a small sample of the many types of heavy traffic and
singular control problems have been considered. [139] treats a singular con-
trol problem where there are infinitely many directions of control. The nu-
merical method for the problem given in [106] is a straightforward extension
of that given here. [67] and [153] treat a problem which arises in heavy traf-
fic modeling, where there are several classes of customers, a ''throughput"
constraint, and an ergodic cost criterion. A numerical method is developed
in [106], which gives a general approach to the Markov chain approximation
method for the ergodic cost problem with a singular control. Forms of the
reflecting diffusions which arise as heavy traffic limits of the "trunk line"
problems in telephone routing can also be dealt with [86, 110]. Ergodic
costs can be handled for all of the problems of this chapter, except for the
singular control problem.

8.2 The Heavy Traffic Problem: A Markov Chain


Approximation
8.2.1 The basic model
Consider the model (1.16) and cost function (1.19), where Xi(t) E [0, Bi],
i = 1, 2. Under appropriate conditions, they are limits of the system and
cost function (1.11) and (1.18), respectively. With these limits in hand, we
would like to obtain numerical approximations to the optimal cost function
V(x) and the optimal control. In practice, one would use an appropriate
adaptation of a "nearly" optimal control for (1.16) and (1.19) on the actual
physical system. Thus the main problem concerns the numerical solution
of the optimization problem for (1.16) and (1.19). The construction of a
locally consistent Markov chain is actually a slight extension of the method
used in Section 5. 7 for the reflected problem and we will review the details
for our special case.
The models (1.15) and (1.16) can be put into the form of the Skorokhod
Problem of Chapter 1. The possible reflection directions are dictated by
8.2 The Heavy Traffic Problem 235

the form of the reflection terms (I- P')Y(t) and -U(t). We next show
how to read the correct reflection directions from {1.15) or {1.16) {they are
the same for both systems). Refer to Figure 8.3a.

Figure 8.3a. The boundary transitions.

The Reflection Directions for {1.15) and {1.16). Write x = (x 1 , x 2 ).


G is the inside box in the figure. The reflection direction are constant on
each of the four open sides of the boundary of G and are as follows:
(a) For x 2 = B 2 , x 1 E (0, Bl), the reflection direction is r(x) = -e 2 =
(0, -1}; i.e., it points downward.
(b) For x 1 = B1. x 2 E {0, B 2 }, the reflection direction is r(x) = -e1 =
( -1, 0).
(c) For x 2 = 0, x 1 E (O,Bl), the reflection direction is r(x) = (-p2 1, 1).
(d) For x 1 = 0, x 2 E (0, B2), we have r(x) == (1, -Pl2)·
The set of allowed reflection directions at a corner is the convex hull of
those associated with the adjoining sides. Recall that the spectral radius of
the "connection" probability matrix P is assumed to be less than unity, in
order to guarantee that each customer spends only a finite average time in
the system. We have also normalized the system such that we can assume
Pii = 0. Owing to the spectral radius condition and the interpretation of P12
and P21 as transition probabilities, both these quantities are no greater than
1, and one of them is strictly less than 1. Let us now examine ( 1.16) a little
236 8. Heavy Traffic and Singular Control

more closely (but still heuristically) to see how these reflection directions
are actually obtained.

More Details of the Calculation of the Reflection Directions for


(1.16). Write (1.16) in the form
X(s) = R(s) +(I- P')Y(s)- U(s), R(s) = (R 1 (s), R 2 (s)).
Fix t, and suppose that X 2 (t) = 0, X 1 {t) E (0, B 1 ). Loosely speaking,
suppose that R 2 (-) tries to "pull" X 2 (-) negative at timet. Then this pull
needs to be compensated for by an increase in Y 2 (·). In particular, let
X 2 (s) = 0, X 1 {s) E (0, B 1 ) on the time interval [t, t + 8] and define f1Ri =
Ri(t + 8)- Ri(t), where flR 2 < 0. Define f1Yi = Yi(t + 8)- Yi(t). Then
we must have flY 2 = -flR2 , flY 1 = 0. Hence,

[I _ P'] 8Y = ( 1 -p21 ) ( 02 ) = ( -P21 ) f1Y2.


-P12 1 flY 1

Hence, the reflection direction is ( -P21, 1) for the boundary points in ques-
tion.
Next, let us repeat this procedure for the corner point (0, 0). Let X 2 (s) =
X 1 (s) = 0 on the interval [t,t + 8), and suppose that f1Ri < O,i = 1,2.
Then
( -~12 -;21 ) ( ~~~ ) =- ( ~~~ ) .
This implies that the set of reflection directions at the corner is the convex
hull of those at the adjoining sides.

8.2.2 The numerical method


The numerical problem is quite similar to that for the variation of the Ex-
ample 2 in Section 5. 7 drawn in Figure 5.8. Let Gh denote the restriction
to G of the state space of a controlled Markov chain which is locally con-
sistent with the unreflected form of (1.16), and with interpolation interval
flth(x, a). In particular, for this illustrative example we will suppose that
the state space is a regular grid with spacing h in each direction and that
the Bi are integral multiples of h. To complete the description of the ap-
proximating chain for the reflected diffusion (1.16), we need only describe
its behavior on the reflecting boundary. Refer to Figure 8.3a, where an ac-
ceptable reflecting boundary act is the set of grid points on the "outer"
boundary. The aCt is disjoint from points on G. We could use points on
G as a "numerical boundary," but it is often more convenient from the
programming point of view to create an "external" reflecting boundary.
Indeed, the use of an external boundary is often closer to the physics of
the problem, where the reflection actually arises from a constraint, where
an "impossible" occurrence is accounted for.
8.2 The Heavy Traffic Problem 237

A Locally Consistent Transition Probability on the Boundary


8Gt. Recall that the transition probabilities for the reflecting states are
not controlled in our examples here (see Section 7.8). Consider point xo.
If P21 equals one {respectively, zero) then the state goes from xo to Xt {re-
spectively, to x2). Suppose that P21 E {0, 1). To realize this desired mean
reflection direction, use the randomization method of Section 5. 7 and set

Similarly for all points x such that x 2 = -hand x 1 E {0, B 1]. The procedure
is analogous for the left hand boundary. The assignment can be completed
by taking all other points on aat to the nearest point on Gh. This set of
rules gives us a natural and locally consistent approximation.

Further Examination of the Lower Right Hand Corner of G. The


rules just suggested for the boundary behavior do yield a locally consis-
tent approximating chain, and they seem to be the simplest rules for this
problem. For possible use on other classes of problems, it is useful to note
that there are other possibilities, and one will be illustrated for the lower
right hand corner. Before doing that in Case 1 below, let us "dissect" the
transition from X3 to X4 = (Bt. 0) which is implied by the ph(x, y) given by
the above stated rules. The results of this discussion will be needed in order
to get the correct cost functional for the approximating Markov chain on
the reflecting boundary.
Case 1. Refer to Figure 8.3b. Let~~= X3. By the above stated rule, we
have ph(xa,x4) = 1 so that ~~+l = X4. At the point x 3 two constraints
are violated, since x~ < 0 and xA > Bt. It is instructive to break the
movement from x 3 to X4 into two segments, correcting for each constraint
violation in turn. Let us first correct for the second component x~ of x 3
(which is negative), and then correct for the first component {which is larger
than B1). To do this, we move first from x 3 along the reflection direction
( -p21, 1) to the point X5 in Figure 8.3b. Then move the remaining distance
(1 - P2t)h from X5 to x4. We can consider the point xa to correspond
physically to the event that there is a lost customer at queue 1, due to a
full buffer and also a simultaneous fictitious output from queue 2. If P2I = 1,
then the fictitious output of queue 2 was the actual input to the buffer of
queue 1. In this case, X5 = X4. If P21 < 1, then the fictitious output of
queue 2 equals the overflow of the buffer of queue 1 only with probability
P21· Hence with probability 1 - P2I, the lost input was an actual true input
(which came from the exterior of the system in this case). We will return
to this decomposition below when we discuss the appropriate cost function
to use for the controlled chain.
238 8. Heavy Traffic and Singular Control

G
X4~\1
xs \

Figure 8.3b. Transitions at the corner.

6 X3

Figure 8.3c. Transitions at the corner.

The method of Case 1 is the most natural and is the one which will be
used.
Case 2. A second possibility for a locally consistent transition probability
at x 3 can be obtained by reversing the order of the "corrections" of case 1.
Refer to Figure 8.3c. Let us correct for the overflow first, by taking X3 to
x 6 • Then correct for the fact that the first component of the state is still
negative by moving along the reflection direction ( -1121, 1) to the point X7.
If P21 < 1, then X7 is not a grid point and we need to randomize between
8.2 The Heavy Traffic Problem 239

xs and x4 in the usual way so as to achieve the desired mean value. It can
be verified that this choice of the transition probability at x 3 is also locally
consistent. It will yield the same asymptotic results as Case 1.
We note that, in the limit, if the driving noise is nondegenerate, then
the contribution of the corner points to the cost is zero, so that the actual
form of the cost used at the corners is asymptotically irrelevant.

The Continuous Time Interpolation 1/Jh (·). Let n be a reflection step,


and use the terminology and more general model of Section 5. 7. Thus,
ch =X E aa+
'>n h • Write ach n + !:l.zh
'>n = !:l.zh n = Eh,o.ach
n• where !:l.zh x,n '>n• In the
problem of concern here, the grid spacing is O(h), communication is to
nearest neighbors only, and we have

Thus,
m 2 n
E sup L:!::izf = O(h)E L:!:l.zf (2.1)
m~n i=O i=O
and {5.7.4) holds.
Owing to the special rectangular shape of G, and using the Case 1 "de-
composition" of the corners, for e~ = X E 8Gt we can write !::iz~ in the
form
(2.2)
where ay:,i (respectively, !:l.U~·i) are nonnegative and can increase only if
e~,i< 0 {respectively, > Bi)· Because in our case only the "lower" reflection
terms might be randomized, the right side of (2.1) is bounded above by
O(h)EIY:+ll·
Define the interpolations uh(t) = u~ on [r!, r!+l) and

and similarly define .zh(·). In Chapter 11, it will be shown that .zh(·) -t
zero process. Now, (5.7.5) takes the form

1/Jh(t) =x +lot b('ljJh(s), uh(s))ds + Mh(t) + Jh(t) (2.3)


+(I- P')Yh(t)- Uh(t) + 8f(t),
where Mh(-), Jh(·) and 8~(-) are defined as in (5.7.5).

The Dynamic Programming Equation for the Markov Chain Ap-


proximation. Let {e~, n < oo} denote a Markov chain which is locally
consistent with the reflecting diffusion (1.16), and use the transition func-
tions described above for the reflecting states. Let u = {u~, n < oo} be
240 8. Heavy Traffic and Singular Control

an admissible control sequence for the chain. A cost functional which is


analogous to (1.19) is, for ki ~ 0,
00

Wh(x, u) = E; L e-!3t~k(~~. u~)~t~


n=O
00

+ E; L e-!3t~k1h [/{e~·'>B,,e~·2~0} + (1 - P2dl{e~·'>B,,e~·2<0}]


n=O
00

+ E; L e-!3t~k2h [I{e~·2>B2,e~·'~O} + (1- p12)/{e~·2>B2,~~·'<0}] .


n=O
(2.4)
The first sum on the right of (2.4) is obviously an appropriate discretization
of the corresponding part of the integral in (1.19). The next sum is an
appropriate analogue of the integral in (1.19) involving U 1 (·) as we shall
now see. Suppose that ~~· 1 = B 1 + h, ~~· 2 ~ 0. Because then ~~i 1 = B1.
the correct overflow correction is h. Suppose that ~~· 1 = B 1 + h, ~~· 2 < 0.
Then recalling the discussion in Case 1 above, we see that the mean overflow
"correction" is h(1 - P2d· The third sum in (2.4) is explained in the same
way. Again we note that the terms in (2.4) with (1-Pii) have no effect in the
limit and can be dropped, if the covariance of the noise is nondegenerate.
By the decomposition in Case 2, (2.4) can be written as

n=O n=O
(2.5)
Let Vh(x) denote the infima of Wh(x,u) over the admissible control
sequences u. For x E Gh, the dynamic programming equation is

vh (x) ~ !!'J!l [e--a.''(x,•) ~ ph(x, yla)Vh(y) + k(x, a)Llth(x, a) l·


(2.6a)
For x E &Gt, the interpolation interval equals zero, the transition proba-
bilities are not controlled and the dynamic programming equation is
Vh(x) = LPh(x, y)Vh(y)
y
(2.6b)
+ k1h [l{x'>B 1 ,x2~0} + (1- P21)l{x'>B ,x2<o})
1

+ k2h [l{x2>B 2 ,x'~O} + (1- P12)/{x2>B2,x'<O}) ·

8.3 Singular Control: A Markov Chain


Approximation
We will work with a two dimensional problem for ease of visualization. It
should be apparent that the ideas are of quite general applicability. The
8.3 Singular Control 241

system and cost function will be (1.22') and (1.23'), respectively, and the
sets C, Ch, and act are the same as in Section 8.2. Let qi > 0, ki 2: 0. Let
ph(x, y) and D.th(x) be a transition probability and interpolation interval
which are locally consistent with the reflected diffusion (1.22') on the state
space Sh = C h u act when the control term F( ·) is dropped. Without loss
of generality, let the Vi in (1.22') satisfy:
All the components of the vectors vi are no greater than unity in absolute
value, and at least one component equals unity in absolute value.
The control in (1.22') can be viewed as a sequence of small impulses act-
ing "instantaneously." With this in mind, we divide the possible behavior
of the approximating Markov chain into three classes:
(i) Suppose that ~~ = X E act. Then we have a "reflection step"' as in
Section 5.7 or Section 8.2 and Llth(x) = 0.
Otherwise, we are in the set Ch E C and there are two choices, only one
of which can be exercised at a time:
(ii) Do not exercise control and use ph(x, y) and D.th(x) which are locally
consistent with the uncontrolled and unreflected diffusion.
(iii) Exercise control and choose the control as described in the next
paragraph.

The Control Step. In order to illustrate the procedure in the simplest


way, let there be only two distinct "impulsive directions," namely, v1 and
v2. Again, the extensions to the general case should be clear. An impulse
in only one direction will be chosen at each control step (not necessary, but
convenient for programming). Suppose that x = ~~ E Ch and we decide to
exert control. Define the mean increment
2
!::l.Fh
n
= ""v·f::l.Fh,i
~ t n '
i=l

where the f::l.Ft:,i are nonnegative. The impulsive control action is deter-
mined by the choice of the direction among the {vi} and the magnitude
of the impulse in the chosen direction. Let vi be the chosen direction. For
convenience in programming, it is preferable if the states move only "lo-
cally." For this reason, the value of the increment f::l.Ft:,i is chosen to take
the state x only to neighboring points. Thus, it equals h.
The procedure is illustrated in Figure 8.4 in a canonical case. In the
figure, xi denotes the point of first intersection of the direction vectors vi
with the neighboring grid lines. We have hvi = Xi- x. Obviously, the Xi
depend on h. If more choices were allowed for the increments, then the
asymptotic results for the optimal value function would be the same.
242 8. Heavy Traffic and Singular Control

Figure 8.4 The control directions.

If the absolute value of each component of the vector Vi were either unity
or zero, then Xi would be a point in the regular h-grid Gh. Otherwise, it
is not. Analogously to what was done for the reflection problem in Section
5. 7 or in Section 8.2 the actual transition is chosen by a randomization
which keeps the mean value as hvi. Thus, for the example in the figure, at
a control step we will have E;;::Lle~ equaling either v1h or v 2 h, according
to the choice of the control direction. Write ph(x, yihvi) for the transition
probability if the control direction is vi. Let i = 2. Then, the corresponding
transition probability is (see the figure for the notation)
(3.1)
The transition probabilities under the choice of direction v1 are analogous.
In analogy to (1.23'), if vi is the chosen direction at time n, then an appro-
priate cost to assign to this control step is
h.
Qih = Qifl.Fn ,t •

A Comment on the Limit of the Control Terms. In the limit, as


h --+ 0, the effects of the randomization disappear. A short calculation
which suggests this will now be given. It will be used when convergence is
discussed in Chapter 11. If n is a control step, then by definition
E~ Lle~ = Ll.F~. (3.2)
Define the error Ll.F~ due to the randomization by
h
-h
Ll.Fn = Llenh - Ll.Fn . (3.3)
8.3 Singular Control 243

Since the partial sums of the tiF~ form a martingale sequence, we can
write
n-1 N-1
E sup L tiFih = O(h)E L ltiPJI· (3.4)
n~N j=O j=O

Equation (3.4) implies that the "error" goes to zero if the sequence of the
costs due to the control are bounded.

The Cost FUnction and Dynamic Programming for the Approxi-


mating Chain. Set e~ = x. Define tit~= tith(e~) for n a diffusion step,
and set tit~ = 0 otherwise. Define t~ = E~,:-01 tit~. For ph an admissible
control sequence, a suitable analogue of (1.23') is

Wh(x, ph) = Et f
n=O
e-.Bt! [k(e~)tit~ + ~ qitiPf:•i + ~ kitiU!·il·
t 1
(3.5)
Then, for x E Gh, the dynamic programming equation is

Vh(x) = min { e-ru:.th(x) LPh(x,y)Vh(y) + k(x)tith(x),


y
(3.6)
min [LPh(x,yihvi)Vh(y)+qih]}.
y

and any suitable approximation to the discount factor can be used. For
X E8Gt, the dynamic programming equation is (2.6b).
A Relationship Between {3.6) and the Dynamic Programming
Equation for {1.22') and {1.23'). The formal dynamic programming
equation for (1.22') and (1.23') is (1.25), to which the reader is referred.
Suppose that the ph(x, y) in (3.6) has been obtained via a finite differ-
ence approximation of the type discussed in Sections 5.1-5.3. Thus, we can
represent the sum

e-,8L}.th(x) LPh(x, y)Vh(y)- Vh(x) + k(x)tith(x)


y

as tith(x) times a finite difference approximation to .C0 V(x)+k(x)-,BV(x).


Let us now rearrange the control term in (3.6) such that it resembles a finite
difference approximation to the control term in (1.25). We work with the
special two dimensional case of Figure 8.4.
Recall that
1- c1 = ph(x,x- e1hlhvl),

1- c2 = ph(x, x- e2hlhv2).
244 8. Heavy Traffic and Singular Control

Subtract Vh(x) from both sides of (3.6). Then the inner minimum divided
by h equals (here i f j)

. [Vh(x- eih)- Vh(x) Vh(x- eih + ejh)- Vh(x) ]


mln h (l - t;) + h Ci + qi ,

which is a finite difference approximation to

This last expression is just the inner minimum in (1.25), once the super-
script h is dropped.
9
Weak Convergence and the
Characterization of Processes

This chapter begins the section of the book devoted to the convergence
proofs and related matters. The purpose of the chapter is to introduce
the mathematical machinery that is needed in the later chapters. Because
particular applications are intended, we do not, in general, give the most
elaborate versions of the theorems to be presented.
Our method for proving convergence of numerical schemes is based on
the theory of weak convergence of probability measures. The theory of weak
convergence of probability measures provides a powerful extension of the
notion of convergence in distribution for finite dimensional random vari-
ables. For the particular problems of this book, the probability measures
are the induced measures defined on the path spaces of controlled processes.
This notion of convergence is important for our purposes, since our approx-
imations to value functions always have representations as expectations of
functionals of the controlled processes.
The first section of the chapter is concerned with general results and the
standard methodology used in weak convergence proofs. Included in this
section is a statement of the Skorokhod representation, which allows the
replacement of weak convergence of probability measures by convergence
with probability one of associated random variables (in an appropriate
topology) for the purposes of certain calculations. The usual application of
weak convergence requires a compactness result on the sequence of prob-
ability measures (to force convergence of subsequences), together with a
method of identification of limits. In Section 9.2 we present sufficient con-
ditions for the required compactness. The conditions will turn out to be
simple to verify for the problems considered in later chapters. Section 9.3
246 9. Weak Convergence and the Characterization of Processes

discusses useful characterizations of the Wiener process and Poisson ran-


dom measures. These results will be used as part of a direct method of
characterizing stochastic processes that will be used often in the sequel.
Therefore, after introducing and discussing the method, we expose some
of the details of its typical application via an example involving uncon-
trolled processes in Section 9.4. In Section 9.5, we define what is meant by
a "relaxed control." Relaxed controls provide a very powerful tool in the
study of the convergence properties of sequences of optimally (or "nearly
optimally") controlled processes. This is due to the fact that under general
conditions, arbitrary sequences of relaxed controls have compact closure.
This is not true of ordinary controls. However, our use of relaxed controls
is simply as a device for proving convergence of numerical schemes. The
controls computed for the discrete state approximating chains will always
be feedback, or Markov, controls.

9.1 Weak Convergence


9.1.1 Definitions and motivation
LetS denote a metric space with metric d and let C(S) denote the set of
real valued continuous functions defined on S. Let Cb(S) and Co(S) denote
the subsets of C(S) given by all continuous functions that are bounded and
have compact support, respectively.
Suppose we are given 8-valued random variables Xn, n < oo and X,
which may possibly be defined on different probability spaces and which
take values in S. Let En and E denote expectation on the probability
spaces on which the Xn and X are defined, respectively. Then we say that
the sequence {Xn, n < oo} converges in distribution to X if Eni(Xn) -t
EI(X) for all I E Cb(S). Let Pn, n < oo, and P denote the measures
defined on (S, B( S)) that are induced by Xn. n < oo and X, respectively.
Clearly, the property of converging in distribution depends only on these
measures: for any I E Cb(S),

Eni(Xn) -t EI(X) {:}Is l(s)Pn(ds) -tIs l(s)P(ds).


We will refer to this form of convergence of probability measures as weak
convergence and use the notation Pn :::} P. Often, we abuse terminology
and notation and also say that the sequence of random variables Xn that
are associated in the manner described above with the measures Pn con-
verges weakly to X and denote this by Xn :::} X. Let g( ·) be any continuous
function from S into any metric space. A direct consequence of the defini-
tion of weak convergence is that Xn :::} X implies g(Xn) :::} g(X). A general
reference for the theory of weak convergence is [13]. More recent works that
9.1 Weak Convergence 247

emphasize the case when the limit is a Markov process and applications
are [52] and [93].
We can jump ahead a bit and indicate the reasons for our particular
interest in this notion of convergence. Consider for the moment the special
case of S = Dk [0, oo ). An example of a random variable that takes values
in the space S is the uncontrolled diffusion process x( ·) where
dx(t) = b(x(t))dt + u(x(t))dw(t),
with x(O) = x given. Of course, x( ·) also takes values in the smaller space
Ck [O,oo), and it is not a priori evident why we have chosen to use the
larger space Dk [0, oo). One important reason is that a basic compactness
result that will be needed in the approach described momentarily is easier
to prove for processes in Dk [0, oo).
Suppose that one is interested in calculating a quantity such as

W(x) ~E. [ [ k(x(s))ds],


where the function k( ·) is continuous and bounded. Then when consid-
ered as a function defined on Dk [O,oo), the mapping¢--+ J{ k(¢(s))ds
is bounded and continuous. Suppose also that there are "approximations"
eh(·) to x(·) available for which the analogous functional

w'(x) ~ E. [ { k(e'(s))ds]
could be readily computed. Recall that some candidate approximations
were developed in Chapter 5. A finite state Markov chain {e~, n < oo} and
an interpolation interval ~th(x) satisfying the local consistency conditions
were constructed, and the process eh(·) was then defined as a piecewise
constant interpolation of {e~,n < oo}. Thus, the processes {eh(·),h > 0}
take values in Dk [0, oo). Then, if the sense in which the processes eh (.)
approximate x(·) is actually eh(·) => x(·), we may conclude Wh(x) --+
W(x). This simple observation is the basis for the convergence proofs to
follow.
In order to make the procedure described above applicable in a broad
setting, we will need convenient methods of verifying whether or not any
given sequence of processes (equivalently sequence of measures) converges
weakly and also for identifying the limit process (equivalently limit mea-
sure).

9.1.2 Basic theorems of weak convergence


Let 'P(S) be the space of probability measures on (S,B(S)), and suppose
that P1 and P2 are in 'P(S). For a given set A E B(S), define A£ = {s' :
248 9. Weak Convergence and the Characterization of Processes

d(s', s) < f for somes E A}. We define the Prohorov metric on P(S) by
rr(Pt. P 2) = inf {~: > 0 : P 1(A) ~ P 2(AE) +f for all closed A E B(S)}.
We will see below that convergence in the Prohorov metric is equivalent to
weak convergence when S is separable. This equivalence makes the result
which follows significant for our purposes. A standard reference for most of
the material of this section is Billingsley [13], where proofs of the theorems
can be found.
Theorem 1.1. If S is complete and separable, then 'P(S) is complete and
separable.
Let { P"Y, 'Y E r} c P(S), where r is an arbitrary index set. The collection
of probability measures {P"Y, 'Y E r} is called tight if for each f > 0 there
exists a compact set KE c S such that
{1.1)

If the measures P"Y are the induced measures defined by some random
variables X"Y, then we will also refer to the collection {X"Y,"f E r} as tight.
The condition {1.1) then reads (in the special case where all the random
variables are defined on the same space)
inf P{X"Y EKE} 2_> 1-~:.
"YEr
Theorem 1.2. (Prohorov's Theorem) If Sis complete and separable,
then a set {P"Y, 'Y E r} c 'P(S) has compact closure in the Prohorov metric
if and only if {P"Y, 'Y E r} is tight.
Assume that S is complete and separable and that a given sequence of
probability measures has compact closure with respect to the Prohorov
metric. It then follows from Theorem 1.1 that existence of a convergent
subsequence is guaranteed. In typical applications we will then show that
the limits of all convergent subsequences are the same. Arguing by contra-
diction, this will establish the convergence of the original sequence. Pro-
horov's theorem provides an effective method for verifying the compact
closure property. The usefulness of this result is in part due to the fact
that tightness can be formulated as a property of the random variables
associated to the measures Pn. Often these objects have representations
(e.g., SDE) which allow a convenient verification of the tightness property.
Remark 1.3. A simple corollary that will be useful for our purposes is
the following. Let 8 1 and 8 2 be complete and separable metric spaces, and
consider the spaces= sl X 82 with the usual product space topology. For
{P"Y,"f E r} c P(S), let {P"Y,l,'Y E r} c 'P(St) and {P"'f,2•'Y E r} c 'P(S2)
be defined by taking P"Y,i to be the marginal distribution of P"Y on Si,
for i = 1, 2. Then {P"Y, 'Y E r} is tight if and only if {P"'f,b 'Y E r} and
{P"Y,2, 'Y E r} are tight.
9.1 Weak Convergence 249

We next present several statements which are equivalent to weak conver-


gence. In particular, we note that statements (ii) and (iii) in the theorem
below will be used in some of the convergence proofs of Chapters 10-15.
Let {}B be the boundary of the set B E 8(8). A set B is said to be a
P-continuity set if P(8B) = 0.
Theorem 1.4. Let S be a metric space and let Pn, n < oo, and P be
elements of'P(S). Then statements (i)- (iv) below are equivalent and are
implied by (v). If S is separable, then (i)- (v) are equivalent.

(i) Pn =} P
(ii) limsupn Pn(F) :'S P(F) for closed sets F
(iii) liminfn Pn(O) 2: P(O) for open sets 0
(iv) limn Pn(B) = P(B) for P- continuity sets B
(v) 7r(Pn, P) --t 0.

Part (iv) of Theorem 1.4 suggests the following useful extension.


Theorem 1.5. Let S be a metric space, and let Pn, n < oo, and P be
probability measures on 'P(S) satisfying Pn =} P. Let f be a real valued
measurable function on S and define Df to be the measurable set of points
at which f is not continuous. Let Xn and X be random variables which
induce the measures Pn and P on S, respectively. Then f(Xn) =} f(X)
whenever P{X E DJ} = 0.
Consider once again the example of an uncontrolled diffusion x( ·) dis-
cussed at the beginning of this section. Suppose that we are now interested
in estimating

W(x) =Ex [loT k(x(s))ds + g(x(T))l ,


where g is a smooth bounded function. Assume we have approximations
eh(·) to x(·) in the sense that eh(·) =} x(·) in Dk [O,oo). Because the func-
tion 4> --t g(tf>(T)) is not continuous on Dk [0, oo), we cannot directly apply
the definition of weak convergence to conclude Wh(x) --t W(x), where

However, Theorem 1.5 implies the convergence still holds, since the limit
process x(·) has continuous sample paths (w.p.1) and 4>(·) --t g(tf>(T)) is
continuous at all 4>( ·) which are continuous.
Remark 1.6. Note that there is an obvious extension of the definition of
weak convergence of probability measures, in which the requirement that
the measures be probability measures is dropped. ForT< oo, let MT(S)
250 9. Weak Convergence and the Characterization of Processes

denote the set of Borel measures M(·) on S satisfying M(S) = T. Via


the identification Mr(S) = {TP : P E P(S)}, we have analogues of all
the statements above regarding weak convergence of probability measures.
In particular, we note the following. If we define tightness for a collection
{M1 ,)' E r} C Mr(S) by requiring that for all € > 0 there exist compact
K, C S such that

then subsets of Mr(S) are relatively compact in the topology of weak


convergence if and only if they are tight.

We finish this section by recalling the Skorokhod representation. Suppose


we are given a sequence of random variables Xn tending weakly to a limit
X. We will see many times in the sequel that the evaluation of limits of
certain integrals associated with the Xn are essential for the purposes of
characterizing the limit X. These calculations would be simpler if all the
random variables were defined on the same probability space and if the
convergence were actually convergence with probability one. The following
theorem allows us assume this is the case when computing the integrals. A
proof can be found in [52].
Theorem 1. 7. Let S be a separable metric space, and assume the proba-
bility measures Pn E P(S), n < oo, tend weakly to P E P(S). Then there
exists a probability space (n, :i, P) on which there are defined random vari-
ables Xn, n < oo, and X such that for all Borel sets B and all n < oo,

P { Xn E B} = Pn (B) ,

and such that

with probability one.

9.2 Criteria for Tightness in Dk [0, oo)


In the previous section we discussed the notion of tightness of a set of
probability measures on a metric space S, which turned out to be equivalent
to precompactness of the set for spaces S that are complete and separable.
In this section we examine the particular case S = Dk [O,oo). We will
consider sets {P1 , 'Y E r} such that each P1 is the measure induced on
Dk [0, oo) by a process x"~ ( ·). To simplify the notation, we will assume that
all the processes {x"~, 'Y E r} are defined on a common probability space
(n, :F, P). Statements when this is not the case are obvious modifications of
the ones given here. The criteria for tightness described below, which will
be quite simple to apply for our problems, are due to Aldous and Kurtz
9.3 Characterization of Processes 251

[85, Theorem 2.7b]. Recall that the random timeT is an Ft-stopping time
if {r ~ t} EFt for all t E [O,oo).
Theorem 2.1. Consider an arbitmry collection of processes {x"~,-y E r}
defined on the probability space (0, .1", P) and taking values in Dk [0, oo).
Assume that for each mtional t E [0, oo) and 8 > 0 there exists compact
Kt,6 c JRk such that SUP'YerP{x"~(t) f/. Kt,6} ~ 8. Define n
to be the
u-algebm genemted by {x"~(s), s ~ t}. Let 7:f be the set of fl -stopping
times which are less than or equal to T w. p.1, and assume for each T E
[O,oo) that

lim sup sup E (1 !\ lx"~(r + 8)- x"~(r)l) = 0. (2.1)


6-+0 'YEP rer;.

Then {x"~, 'Y E r} is tight.

9.3 Characterization of Processes


As remarked in Section 9.1, our approach to proving the convergence of
numerical schemes will be based on proving the weak convergence of a
sequence of stochastic processes to an appropriate limit process. In Sec-
tion 9.2 a useful condition for precompactness of sequences of processes
was given. In this section we will give characterizations of Wiener pro-
cesses and Poisson random measures. These characterizations will be used
as part of a rather straightforward method for identifying the limits of con-
vergent sequences of processes, where the elements of the sequence have
been chosen to approximate the solution to some given SDE. Together
with the precompactness, this will imply the convergence of processes in
the form we need. The basic approach to be used is as follows. By rewriting
the dynamical equations of the prelimit processes in an appropriate way,
the "parts" of the processes corresponding to the limiting process' Wiener
process and Poisson random measure are identified and the appropriate
convergences demonstrated. Then, by using a simple approximation argu-
ment, it is demonstrated under a weak sense uniqueness assumption that
the sequence of approximations to the SDE converge weakly to the solu-
tion of the SDE. This basic method will be used frequently in the sequel.
An elementary but detailed example of its application is given in the next
section for the case of an uncontrolled diffusion process.
An alternative method of characterization that has come to be widely
used is the martingale problem method of Stroock and Varadhan. This
method could be used here as well and, in fact, has been applied to these
problems in the past [93, 90]. We have chosen not to use it here because it
does not seem to provide for more general results and requires a separate
statement of the appropriate martingale problem for each of the problem
252 9. Weak Convergence and the Characterization of Processes

classes we consider. However, it should be noted that the martingale prob-


lem formulation can be a useful tool in establishing the weak sense unique-
ness which we assume. Other methods are also available (e.g., semigroup
based approaches) and a comparison and discussion of various features may
be found in [52].

Wiener Process. Consider a process w( ·) defined on a probability space


(O,.F,P) which takes values in cn[o,oo) (for some n < oo), and which
satisfies w(O) = 0 w.p.l. Let Cw denote the differential operator defined by
1 n
Cwf(x) = 2 Lfx,x;(x)
i=l

for f E CJ (JRn). Define

MJ(t) = f(w(t))- f(O) -lot Cwf(w(s))ds.

Suppose that Ft is a filtration such that w( ·) is an n-dimensional Ft- Wie-


ner process. Then Ito's formula implies that MJ(t) is an .Ft-martingale for
all f E CJ(IRn). A useful fact is that the converse is also true. If there is
a filtration Ft defined on (0, F, P) such that MJ(t) is an .Ft-martingale
for all f E CJ(JRn), then w(·) in ann-dimensional Ft-Wiener process. In
particular, if w(t) is a continuous local martingale whose quadratic vari-
ation is It, then w(·) is a Wiener process with respect to the filtration it
generates.

Poisson Random Measure. Consider an integer valued random measure


N(-) defined on a probability space (n, F, P) which for each w E n is a
measure on the Borel subsets of [0, oo) x r. For (} E C(r), set (}N(t) =
J~ fr B(p)N(dsdp), and let £ljy denote the operator defined by

£ljyj(x) =.A l [f(x + B(p))- f(x)] II(dp)

for f E Co(IR). Define

MJ(t) = f(BN(t))- f(O) -lot £lfvf(BN(s))ds.

If there is a filtration .Ft defined on (0, .F, P) such that MJ(t) is an .Ft-mar-
tingale for all f E Co (JR) and 8 E C(f), then N (·) in an .Ft- Poisson
random measure with intensity measure .Adt x II( dp) on [0, oo) x r.

Lastly, we note the important fact that a Wiener process and a Poisson
random measure that are defined on the same probability space and with
respect to the same filtration are mutually independent [75, Theorem 6.3].
9.4 An Example 253

9.4 An Example
It is instructive to see how the results outlined so far in this chapter yield
convergence of numerical schemes in a simple example. Because the ex-
ample is purely motivational, and since the full control problem will be
treated in Chapter 10, we consider a numerical problem for a simple one
dimensional diffusion. Although simple, the example will illustrate the way
that weak convergence methods can be used to justify numerical approxi-
mations. The problem considered can in some cases be more easily treated
by classical methods from numerical analysis. However, it will expose some
important points and will illustrate the typical use of the material pre-
sented in the last three sections. This section will also serve as a reference
point for our more involved use of the basic methods later in the book.
We consider a problem with a diffusion process that is the solution to
the one dimensional SDE
dx = b(x)dt + u(x)dw, x(O) = x. {4.1)
The problem of interest will be the approximation of the function

W(x) =Ex [loT k(x(s))ds + g(x(r))],


where T = inf{t : x(t) E {0, B}} and W(x) = g(x) for x E {0, B}. This
is the problem considered in Example 3 of Section 5.1. For simplicity, we
assume the existence of c > 0 such that u 2 (x) ~ c. We also assume that
b(·) and u(·) are Lipschitz continuous. Recall that by Theorem 1.3.1 this
implies the weak sense uniqueness of the solution to (4.1).

Selection of an Approximating Chain. Let h be such that B Jh is an


integer. For our choice of approximating Markov chain we can use any of
the chains developed in Section 5.1 for this model. Using the notation of
Chapter 5, we define
h u 2 (x)/2 + hb±(x)
P (x, x ±h) = u2 (x) + hJb(x)l '
Let~~ be a Markov chain with transition probabilities ph(-,·) and~~= x.
As usual, we define
n-1
~h(t) = ~~ for t E [t~, t~+l), where t~ = L ~th(~f).
i=O
We also define the stopping times Th = t'Nh,Nh = inf{n: ~~ E {O,B}}.
The approximation to W (x) is then given by

W'(x) ~ E! [~\({!')ll.t"((t) + g((!(,,)l


254 9. Weak Convergence and the Characterization of Processes

= E; [loTh k(eh(s))ds + u(eh(T'h))]


for points x of the form x = ih, i E {0, ... , B/h}. Recall that Wh(x) satisfies
the relation (5.1.13), which is the equation used for numerically computing
Wh(-).
In this example, we have used the processes eh(·), rather that 1/Jh(·), in
the representation for Wh(x). This was done in order to slightly simplify
the notation. It is worth noting that it is often more convenient to work
with the 1/Jh(·) processes. See, for example, Chapter 11. In general, we will
work with the processes that are most convenient for the problem at hand.

An Outline of the Proof of Convergence. We will prove that (h( ·) :::}


x(·), Th:::} r, and that the mapping

(¢>(-), t) --t lot k(</>(s))ds + g(</>(t))


is continuous (w.p.1) with respect to the measure induced by (x(·), r). By
Theorem 1.5 this implies

Under a uniform integrability condition that is also proved below, the


last equation implies the convergence of the expectations, and therefore
Wh(x) --t W(x). Most of the effort involves showing (h(-) :::} x(·). To do
this, we first prove tightness. We then take any subsequence of {(h ( ·), h >
0}, extract a convergent subsequence, and identify the limit as a solu-
tion to (4.1). The weak sense uniqueness of solutions to (4.1) then gives
eho:::} x(·).

Theorem 4.1. The collection {(h ( ·), h > 0} is tight.

Proof. We must show the assumptions of Theorem 2.1 hold for the process
(h(·). Recall that E! denotes expectation conditioned on F((f, i ::::; n)
and that fl.(! = e!+l - e!. By construction, the chain satisfies the local
consistency conditions, and by a calculation given in Section 5.1

E~Ll(~ = b((~)Llth((~),
(4.2)
E~ {fl.(~- E~fl.(~) 2 = [a 2 ((~) + O(h)] Llth((~).
Let Nh(t) = max{n: t~::::; t}. Using (4.2), we compute

Nh(t)-l 2

L [Eftl.(f + {Ll(f- Eftl.(f)]


i=O
9.4 An Example 255

2
Nh(t)-1
L b(~f)~th(~f)
i=O
Nh(t)-1
+ 2E; L [a(~f) + O(h)] ~th({f)
i=O

where K is a bound for lb(x)l V la(x)l for all x E [0, B]. Together with
Chebyshev's inequality, this yields the first condition assumed in Theorem
2.1.
We must also prove (2.1). In the present context, this condition may be
rewritten as

where T,P be the set of .rt -stopping times which are less than or equal to
T w.p.1 and .rt is the a-algebra generated by {{h(s),s ~ t} = {{f,i ~
Nh(t)}. The limit (4.3) can be proved by calculations similar to those of
the previous paragraph. Using the strong Markov property of the process
{~f,i < oo}, we have

E; (1 A leh(r + 15)- eh(r)i) < ( E; leh(r + 15)- eh(r)n 112


< (2K 2152 + 2 (K + O(h)) 15) 112 ,
for any r E Tj!. This implies (4.3) .•
Theorem 4.2. The processes {h(·) converge weakly to a solution x(·) of
equation (4.1).

Proof. In the arguments to follow, we will extract subsequences several


times. To keep the notation reasonable, we abuse notation and retain
the index h for each successive subsequence. Consider any subsequence
{{h(·), h > 0}. By tightness, we may extract a weakly convergent subse-
quence, again referred to as {{h(·), h > 0}. We will prove {h(·)::::} x(·). By
the usual argument by contradiction, this proves that the original sequence
converges weakly to x(·).
To prove that {h(·) converges weakly to the solution of (4.1), we essen-
tially "construct" the Wiener process appearing in the representation (4.1)
256 9. Weak Convergence and the Characterization of Processes

for the limit process. The local consistency condition (4.2) gives
Nh(t}-1

~h(t)- X = L [Ef~~t + (~~f- Ef~~f)]


i=O
Nh(t}-1 Nh(t}-1

L b(~f)~th(~f) + L 11(~f)[(~~f- Ef~~f) /11(~f)]


i=O i=O

This suggests the definition


n-1
wh(t) = L (~~f- Ef~~f) /11(~f) fortE [t~, t~+l). (4.4)
i=O

From the calculations which were used to prove the tightness of the se-
quence {~h(·),h > 0} we obtain tightness of the sequence {wh(·),h > 0}.
Let {(~h(·),wh(·)),h > 0} be a convergent subsequence, and denote the
limit by (x(·),w(·)). We first prove that w(·) is indeed a Wiener process.
Let us fix t ~ 0, r > 0, q < oo, ti E [0, t] with ti+1 > ti for i E {0, ... , q},
and any bounded continuous function H: JR 2 xq---+ JR. Let f E C~(JR) and
let Cw be the differential operator of the Wiener process, i.e., Cwf(x) =
(1/2)/xx(x). From the definition of wh(-),
rt+r
f(wh(t + r))- f(wh(t))- lt Cwf(wh(s))ds
Nh(t+r}-1
L [f(wh(t~+1))- f(wh(t~))]
i=Nh(t)
Nh(t+r)-1
L ~fxx(wh(t~))~th(~f) + O(h2 )
i=Nh(t)
Nh(t+r}-1
L fx(wh(t~)) [~~f- Ef~~f] /11(f.f)
i=Nh(t)
1 Nh(t+r}-1
+2 L fxx(wh(tf)) [~~f- Ef~~f] 2 /11 2 (~f)
i=Nh(t)
Nh(t+r}-1
-~ L fxx(wh(tf))~th(~f) + fh + O(h2 ),
i=Nh(t)
where Eh kh I ---+ 0 as h ---+ 0. By using this expression together with the
consistency condition (4.2) we have
EhH(f.h(ti),wh(ti), 1:::; i:::; q)x

[!(wh(t + r))- f(wh(t)) -lt+r Cwf(wh(s))ds] (4.5)

:::; Ehlthl + O(h).


9.4 An Example 257

At this point we would like to take limits in (4.5) in order to obtain

EH(x(ti),w(ti), 1 ~ i ~ q) [!(w(t + r))- f(w(t)) -lt+T .Cwf(w(s))ds]


=0.
(4.6)
If all of the processes ~h (·), wh( ·), x( ·), and w( ·) were defined on a common
probability space and if, rather than weak convergence, the actual sense
of convergence were (~h(·),wh(·))-+ (x(·),w(·)) w.p.l., then (4.6) would
follow from the dominated convergence theorem. But for purposes of sim-
ply computing the expectation in (4.6) it is only the distributions of the
processes that are important and not the probability space on which the
processes (x(·),w(·)) are defined. By the Skorokhod representation (Theo-
rem 1. 7) there exists a probability space on which there are defined random
processes ({h(·),wh(·)) and ({(·),w(·)) such that for h > 0, ({h(·),wh(·))
has the same distribution as (~h(·),wh(·)), ({(·),w(·)) has the same distri-
bution as (~(·),w(·)), and ({h(·),wh(·))-+ ({(·),w(·)) w.p.l. By replacing
~h(·),wh(·),x(·), and w(·) in (4.5) and (4.6) by {h(·),wh(·),{(·), and w(·),
and taking limits as h -+ 0, we obtain (4.6) as written.
Let :Ft = F(x(s),w(s), s ~ t). Ft is also the 0'-algebra generated by all
random variables of the form H(x(ti), w(ti), 1 ~ i ~ q), where H(·) is any
bounded and continuous function of 2q variables, ti E [0, t] fori~ q, and q
is any finite integer. By (4.6)

E [f(w(t + r))- f(w(t)) -lt+T .Cwf(w(s))ds] lA = 0

for all A E :Ft, which is equivalent to the statement that

E [ f(w(t + r))- f(w(t)) -lt+T .Cwf(w(s))dsl.rt] = 0

w.p.l. We therefore conclude that f(w(t))- f(w(O))- J~ .Cwf(w(s))ds is


an :Ft-martingale for all f E C5(JR). By construction the jumps of wh(·)
converge to zero uniformly, which implies that the limit process w( ·) has
continuous sample paths. Hence, the characterization given in Section 9.3
implies w(·) is an Ft-Wiener process.
We next identify the process x(·). In the same way that it was used in
the next to last paragraph we can employ the Skorokhod representation
o
and assume that ~h(·)-+ x(·) with probability one. For each > 0 define

Then ~N ·) -+ x.s (·) with probability one in D [0, oo). From the definition
258 9. Weak Convergence and the Characterization of Processes

of the process wh (-),


Nh(t)-1

eh(t)- x = L [Ef~e: + (~e:- Ef~e?)]


i=O

1 O
t
b(eh(s))ds +
Nh(t)-1
L u(ef) [wh(tf+ 1)- wh(tf)] + O(h
i=O
2 ).

Using this representation, the continuity and boundedness of b(·) and u(·),
and the tightness of {eh(·), h > 0}, we can write

eg(t)- X = 10
t
b(eg(s))ds +
[t/6]
L u(eg(j8)) [wh(j8 + 8)- wh(j8)]
j=O
+ O(h2 ) + E~,t•
where [s] denotes the integer part of s and where EjE~ tl -t 0 as 8 -t 0,
uniformly in h > 0 and tin any bounded interval. Taking' the limit ash -t 0
yields

x.s(t)- x = 1 0
t
b(x0 (s))ds +
[t/6]
L u(x.s(j8)) [w(j8 + 8)- w(j8)] +
j=O
Eo,t,

where Elt:.s,tl -t 0 as 8 -t 0. For each j, the random variable x0 (j8) = x(j8)


is independent of the random variables {w(s) - w(j8), s ~ j8} (since w(·)
is an Ft- Wiener process). The boundedness of u( ·) and properties of the
Wiener process imply

x 0 (t)- x = 1t b(x0 (s))ds + 1t u(x 0 (s))dw(s) +l0 ,t,


where Ello,tl -t 0 as 8 -t 0. By {1.3.3),

lt u(x 0 (s))dw(s) -t lt u(x(s))dw(s)

as 8 -t 0. Therefore, x( ·) solves

x(t) = x +lot b(x(s))ds + lt u(x(s))dw(s)


as claimed. •

A Topology for the Set [O,oo]. Because the cost Wh(x) also involves the
potentially unbounded stopping times Th, we must consider weak conver-
gence for sequences of random variables with values in [0, oo]. We consider
9.4 An Example 259

[0, oo] as the one point compactification of [0, oo ), i.e., the point {oo} is ap-
pended to the set [0, oo) as the limit point of any increasing and unbounded
sequence. Since the set [0, oo] is compact, any sequence of random variables
taking values in this set, and in particular the sequence of stopping times
{Th,h > 0}, is tight.
Theorem 4.3. Under the assumptions of this section, we have Wh(x) -t
W(x).

Proof. Consider the pair (eh(·), Th)· Let {(eh(-), Th), h > 0} be a conver-
gent subsequence, with limit denoted by (x(·),f). Using the Skorokhod
representation, we can assume that the convergence is

(4.7)

with probability one. Before we can apply Theorem 1.5 to show Wh(x) -t
W(x), there are several issues to resolve. Recall that T = inf{t : x(t) E
{O,B}}. From the definitions of Wh(x) and W(x) we see that to prove
Wh(x) -t W(x) we will need
f=T (4.8)

w.p.l. Furthermore, it must be demonstrated that the mapping

(¢{), t) -t 1t k(¢(s))ds + g(¢(t)) (4.9)

is continuous (w.p.1) with respect to the measure induced by (x(·),T) on


D[O,oo) x [O,oo].
Consider the mapping from D [0, oo) to [0, oo] given by

f(¢(·)) = inf{t: ¢(t) E {0, B} }.

Since k( ·) and g( ·) are continuous and since the sample paths of x( ·) are
continuous w.p.1, sufficient conditions for the continuity w.p.1 of (4.9) are

f(x(·)) < oo and f(x(·)) is continuous in x(·) (4.10)

w.p.l. Furthermore, if (4.10) holds w.p.1, then (4.7) implies

w.p.l. By the usual argument by contradiction, we have Th =? T for the


original sequence. Thus, (4.10) implies (4.8).
Under (4.10), equation (4.7) implies
260 9. Weak Convergence and the Characterization of Processes

w. p.1. Assume that the collection random variables

is uniformly integrable for some ho > 0. Then this uniform integrability


implies the desired convergence

The boundedness of k(·) and g(·) imply that a sufficient condition for the
uniform integrability of (4.11) is uniform integrability of

{rh, hE (0, ho)}. (4.12)

Except for the proofs of (4.10) and (4.12), the proof of Wh(x) ~ W(x) is
complete. •

Proofs of {4.10) and {4.12). Conditions analogous to (4.10) will appear


in virtually all problems we consider with an absorbing boundary. If, in
addition, the problem is of interest over a potentially unbounded interval
(as here), a condition similar to (4.12) will be required.
We turn first to the proof of (4.10) and consider the continuity of the es-
cape time. We treat only the case cj>(f(¢{))) = B, since the case cj>(f(c/>(·)))
= 0 is symmetric. Also, we may assume cj>( ·) is continuous because x( ·)
has continuous sample paths w.p.l. Suppose that for all 6 > 0 we have
cj>(t) > B for some t E (r, r + 6). Then cf>n ~ cj> implies limsupf(c/>n(·)) ~
f(c/>(·)). On the other hand, the fact that cj>(t) < B fort< f(c/>(·)) implies
liminff(c/>n(·)) 2:: f(c/>(·)). Thus, to prove the continuity w.p.1 of f(c/>(·))
under the law of x(·), we need only prove that for all 6 > 0, x(t) > B for
some t E (r, r + 6], w.p.l. In the present example, this holds because of
sample path properties of the stochastic integral, as will now be shown.
We recall the law of the iterated logarithm. If w( ·) is a standard Wiener
process, then
limsup w(t) = 1
t.t.o (2t log logt) 112
w.p.l. Thus, the supremum of w(s) for sin the interval (0, t) behaves like
(2t log log t) 112 as t ~ 0. Suppose f(t) is a Lipschitz continuous function
and that /(0) = 0. It follows that for all to > 0, w(t) - f(t) > 0 for some
t E [0, to], w.p.l. We next apply this result to x(·). Recall that a(·)= u 2 (·).
There is a rescaling of time s ~ t(s) satisfying cs ~ t(s) ~ Cs for c =
infxe[o,B] a(x) and C = SUPxe[O,BJ a(x), and such that

w(s) = 1'T
-r+t(s)
u(x(r))dw(r)
9.4 An Example 261

is a standard Wiener process [83, Theorem 3.4.6]. Because J;+t(s) b(x(r))dr


is Lipschitz continuous, the law of the iterated logarithm implies x(t) >
B w.p.1 for some t E [r, T + 8]. Consequently, we have the w.p.1 continuity
of f(q'>(-)) under the law of x(·).
Now consider (4.12). Assume there is T < oo and ho > 0 such that

inf inf Px{Th < T} = 8 > 0. (4.13)


hE[O,ho] xE{O,h,2h, ... ,B}

Then for all h E [0, ho] and x E {0, h, 2h, ... , B}, Px{Th ~ T} ~ (1- 8).
From the Markov property of ~h(·), Px{Th ~ iT} ~ (1- 8)i for i < oo.
Therefore,
00

Ex (rh) 2 ~ L(iT) 2(1- 8)i < oo,


i=l

which implies the uniform integrability of (4.12). The condition (4.13) can
be established using only weak convergence arguments and properties of
the limit process x(·). First, we note that

inf Px { T < T} > 0


xE[O,B]

for all T > 0. This can be proved in many ways. For example, the inequality
can be shown by using the same time change as used in the proof of (4.10)
together with the explicit form of the Gaussian distribution. Next, assume
that (4.13) is not true. Then there exist T > 0, hn -t 0, and Xn -t x E [0, B]
such that limn Px,. { Th,. < T} = 0. Consider processes ~h.. ( ·) which start at
Xn rather than at a fixed point x at time t = 0. Because {Xn, n < oo} is
compact, the same calculations as those used in the case of a fixed starting
position imply {ehn (. ), n < 00} is tight, and also that eh,. (.) tends weakly to
a solution x( ·) of (4.1). As shown above, the exit time f( ¢( ·)) is a continuous
function of the sample paths of x( ·) with probability one, so, by Theorem
1.5, Th,. =? T. Using part (iii) of Theorem 1.4,

By contradiction, (4.13) is true. •

Remarks. In the sequel we will see many variations on the basic ideas of
this example. In all cases the required adaptations are suggested directly
by the particular properties of the problem under consideration. Here we
comment on several aspects of the problem considered in this section.
Part of the power of the approach advocated in this book is the ease
with which it handles less stringent assumptions. Aside from weak sense
uniqueness assumptions on certain limit processes, which would seem to
be expected in any case as a consequence of proper modelling, there is
considerable flexibility in weakening the other assumptions. For example,
262 9. Weak Convergence and the Characterization of Processes

suppose the condition a 2 (x) :=:: c > 0 on [0, B] is dropped. In order that
the process not have any points at which it is "stuck," let us assume
infxE[O,B] [a 2 (x) + lb(x)l) > 0. Then with some minor modifications and
appropriate additional assumptions on b(·), the method still can be used.
For example, at points where a(x) = 0 the definition (4.4) of the approxi-
mation wh(-) to the limiting Wiener process must be modified. This reflects
the fact that we cannot "reconstruct" the Wiener process from the process
x(·) at those points. However, minor modifications of the definition of wh(-)
and the associated filtration solve the problem, and the analogous conclu-
sion regarding convergence of ~h(·) follows as before. The details may be
found in Chapter 10.
If we weaken the nondegeneracy assumption on a(·), then we must also
reconsider the proofs of (4.10) and (4.12). These conditions are not simply
technical nuisances whose validity is not directly related to the convergence
of the schemes. For example, suppose a(x) = 0 in neighborhoods of both 0
and B. Suppose also that b(B) ::; 0. Then clearly the process x(·) does not
exit through the point B. An analogous statement holds if b(O) :::: 0. In such
a case, (4.12) clearly fails. However, it will also be true that for reasonable
choices of k(·) that W(x) = oo. Thus, we can have seemingly reasonable
approximations (i.e., local consistency holds) and only the failure of (4.12)
indicates a difficulty with problem formulation. Precise conditions for such
degenerate problems and verification of the analogues of (4.10) and (4.12)
will be described in detail in Chapter 10.

9.5 Relaxed Controls


In the last section it was demonstrated that the convergence properties of
suitably scaled and interpolated Markov chains could be used to establish
the convergence of a numerical scheme corresponding to an uncontrolled
diffusion. For the analogous problem involving controlled processes, it will
be necessary to consider sequences of controlled Markov chains. In this case,
besides the convergence properties of the processes we must also deal with
the convergence of the controls. When we choose an approximating Markov
chain, we essentially force the limit process to be of the desired type by
building the limiting properties directly into the chain, e.g., the consistency
conditions. However, in general, we can do little to force the limits of the
controls that are optimal for the chains to take any preassigned form, such
as feedback. This does not reflect any technical shortcoming, but is in part
due to the fact that the infimum will not be attained in many optimal
control problems within a given class of controls unless it is compact in an
appropriate sense. An example of such lack of compactness was given in
Section 4.6 for a deterministic optimal control problem.
We will often take simultaneous limits of processes and controls in the
convergence proofs that are to come. In order to guarantee the existence
9.5 Relaxed Controls 263

of limits, it will be necessary to work with a space of controls that have


the appropriate closure property and which yield the same minimum value
function for the optimization problem we seek to approximate. The relaxed
controls form a class with such properties. Relaxed controls were first intro-
duced by L. C. Young to establish existence of a (generalized) minimizing
control for problems from the calculus of variations [157]. They were later
extended to the stochastic setting [55] and have since found several uses,
especially with regard to convergence properties [95]. The controls com-
puted for the approximating numerical problem are always of the feedback
form.

Deterministic Relaxed Controls. Consider a compact subset U of some


finite dimensional Euclidean space. For convenience we shall assume that
each control we deal with is defined on the interval [0, oo), although it may
actually only be applied on some bounded subset. We recall some of the
definitions and notation introduced in Section 4.6. The O"-algebras B(U)
and B(U x [0, oo)) are defined as the collection of Borel subsets of U and
U x [0, oo ), respectively. A relaxed control is then a Borel measure m( ·)
such that m(U x [0, t]) = t for all t 2: 0. We can define a derivative mt(·),
such that
m(B) = r
lux[O,oo)
I{(a,t)EB}mt(da)dt (5.1)

for all BE B(U x [0, oo)) [i.e., m(dadt) = mt(da)dt] and such that for each
t, mt(·) is a measure on B(U) satisfying mt(U) = 1. For example, we can
define mt ( ·) in any convenient way for t = 0 and as the left hand derivative
fort> 0:
_ 1. m(A x [t - 8, t])
mt (A) -Im .~:
8~0 u
for A E B(U). Let R(U x [0, oo)) denote the set of all relaxed controls on
U x [O,oo).

A Topology for the Space of Relaxed Controls. The space R(U x


[0, oo)) can be metrized in a convenient way in terms of the Prohorov metric
of Section 9.1. Let nT(-) denote the Prohorov metric on P(U x [0, T]).
For m1, m2 E R(U x [0, oo)), define mf, i = 1, 2, to be the normalized
restrictions of these measures to U x [0, T]; i.e.,

for all BE B(U x [0, T]). Thus, each mf


is always a probability measure.
We define a metric on R(U x [0, oo)) by
00

d(m 1,m2) = LTini(m{,m~).


j=l
264 9. Weak Convergence and the Characterization of Processes

Under this metric, a sequence mn(·) in 'R(U x [O,oo)) converges tom(·) E


'R(U x [0, oo)) if and only if m~ =} rn,i for all j < oo. Note that this is
equivalent to

I ¢>(a,s)mn(dads) ~I ¢>(a,s)m(dads)
for any continuous function¢(·,·) on U x [O,oo) having compact support.
Since P(U x [0, j]) is complete, separable, and compact for all j < oo, these
properties are inherited by the space 'R(U x [0, oo)) under this metric. It
follows that any sequence of relaxed controls has a convergent subsequence.
This key property will be used often in the sequel. We will write mn =} m
for convergence in this ''weak-compact" topology.

Relation to Ordinary Deterministic Controls. We recall the problem


considered in Section 4.5. Minimize the cost

W(x,u) = 1 00
e-f3sk(x(s),u(s))ds

over all U -valued measurable control processes u( ·), where x( ·) solves

x(t) = b(x(t),u(t)), x(O) = x.


Recall also that /3 > 0, and that b(·) and k(·) are assumed bounded and
continuous. As in Section 4.6, any ordinary control u( ·) has the relaxed
control representation m(·), where we define

m(A x [0, t]) = 1t IA(u(s))ds,

in the sense that the cost function and dynamics may be rewritten as

W(x,u) = W(x,m) = 1L 00
e-f3sk(x(s),a)m(dads), (5.2)

x(t) = x +lot Lb(x(s),a)m(dads). (5.3)

Let mn(-) denote a sequence of relaxed controls such that the associated
costs converge to the infimum over all relaxed controls. Compactness of
the space of 'R(U x [0, oo)) and boundedness of b(·) imply relative com-
pactness of the sequence (xn(-),mn(·)), where xn(-) is the solution to the
controlled ODE under mn(·). Suppose we retain n as the index of a conver-
gent subsequence and suppose also that the limit is denoted by (x(·),m(·)).
Owing to the weak convergence mn(-) =} m(·), x(·) solves (5.3) with the
cost (5.2). Thus, the infimum over the relaxed controls is always attained.
By the approximation theorem of Section 4.6 any relaxed control may be
9.5 Relaxed Controls 265

arbitrarily well approximated by an ordinary control in the sense that the


costs under the two controls may be made arbitrarily close. It follows that
the infimum over the class of relaxed controls is the same as that over the
ordinary controls.

Stochastic Relaxed Controls. A stochastic relaxed control will be a


control process that is a deterministic relaxed control for each element of
the underlying probability space, and which also satisfies the nonanticipa-
tiveness condition usually assumed of ordinary stochastic controls. Recall
that our primary interest is in weak sense solutions of controlled stochastic
differential equations. Suppose we are given a probability space (0, :F, P),
a filtration :Ft, an :Ft-Wiener process w(·}, and an :Ft-Poisson random
measure N(·). Then we say m(·) is an admissible relaxed control for the
pair (w(·}, N(·}}, or that the triple (m(·}, w(·}, N(·)) is admissible, if m(·,w)
is a deterministic relaxed control with probability one and if m(A x [0, t])
is :Ft-adapted for all A E B(U). There exists a derivative mt(·) such that
mt(A) is :Ft-adapted for all A E B(U) and such that (5.1) holds with
probability one [95]. Because the space 'R(U x [0, oo)) is compact, any col-
lection of relaxed controls in this space is tight. The definitions of weak
existence and weak uniqueness of controlled processes under relaxed con-
trols are analogous to those given in Section 1.3 for ordinary stochastic
controls. As noted in that section, the techniques and assumptions that are
needed to prove weak existence and weak uniqueness of solutions to a SDE
with an admissible relaxed control are essentially the same as those for the
case without control.
The following convention will simplify the notation in later chapters. Let
m(·) be an admissible relaxed control. Fort~ 0 let m(t, ·) be the random
measure with values m(t, A)= m(A x [0, t]) for A E B(U). This definition
involves an abuse of notation, but it should not cause any confusion.
10
Convergence Proofs

This chapter is the core of the mathematical part of the book. It deals
with the approximation and convergence theorems for the basic problem
classes: discounted problems with absorbing boundaries; diffusion and jump
diffusion models; optimal stopping problems, and problems where we stop
on hitting a target set and where there is no discounting. The convergence
results for the case of reflecting boundaries and the singular and ergodic
control problems will appear in the next chapter.
The chapter starts off with some approximation and limit results for
a sequence of controlled jump diffusion problems. These results and the
methods which are used provide a base for the later proofs of the con-
vergence of the numerical approximations. The first result, Theorem 1.1,
shows that the limit of a sequence of controlled jump diffusions is also a
controlled jump diffusion. The method of proof is more direct than the
usual martingale problem approach, because we have access to the driving
Wiener processes and Poisson measures. The approach is a combination of
a classical weak convergence method together with a direct construction.
The main points of the theorem are the "admissibility" of the limit controls
and the stopping or exit times. The theorem provides the basis for the ap-
proximation of relaxed controls by simpler controls. These simpler controls
will be applied to the approximating chains to get the convergence results
for the numerical methods. In particular, Theorem 1.2 shows that we can
approximate a relaxed control by a piecewise constant control which takes
values in a finite set.
Theorem 2.1 concerns convergence of sequences of controlled problems
to a limit problem, when control stops at the first moment that a given
268 10. Convergence Proofs

set is exited. It is essentially a consequence of Theorem 1.1, except for


the problem of the behavior of the limit path at the first hitting time
of the boundary. There is a discussion of this point, and the concept of
"randomized stopping" is introduced. Analogous "boundary" problems will
appear when dealing with the Markov chain approximations.
Theorem 3.1 shows that we can approximate the optimal control by
a "nice" control, which is an appropriate function of the driving Wiener
process and Poisson measure at a finite number of time points, and which
is continuous in the values of the Wiener process at those time points. Such
an approximation will be needed to show that the sequence of optimal costs
Vh(x) converges to the optimal cost V(x).
Section 10.4 shows that weak limits of the sequence '1/Jh(-) (defined in
Subsection 10.4.1 below and Section 4.3) of Markov process interpolations
of the approximating Markov chains are actually controlled jump diffusions,
and shows that liminfh Vh(x) 2: V(x). The proof introduces auxiliary pro-
cesses wh(·) and Nh(·) which are approximations to the "limit" Wiener
process and Poisson measure, respectively. The '1/Jh( ·) are represented in
terms of these auxiliary processes, and this facilitates getting the desired
limit. Section 10.5 is concerned with convergence of the costs to an optimal
cost. To do this we need to show that limsuph Vh(x) ~ V(x). This is done
by using the particular form of the €-optimal control for the limit problem
which is continuous in the sampled values of w(·), as derived in Section
10.3, together with the optimality of Vh(x) for the approximating chain.
In Section 10.6, the convergence of the numerical approximations for the
optimal stopping problem is proved.
Only some selected problem classes are treated, but it will be seen that
the methods are quite general and have much use elsewhere.
Local consistency is not always possible to get at all points, as seen in the
treatment of grid refinement in Section 5.5. It is also a problem when the
dynamical terms are discontinuous. It is shown in Theorem 5.3 that one
can still get convergence of the numerical algorithms under quite broad
conditions.

10.1 Limit Theorems and Approximations of


Relaxed Controls
10.1.1 Limit of a sequence of controlled diffusions
This section will be the first application of the weak convergence ideas of
Chapter 9 and will establish some methods which will be needed later.
We will be given a sequence of controlled jump diffusions, each one being
"driven" by a possibly different control and Wiener process and Poisson
measure. It will be shown that the sequence of solutions and driving pro-
cesses is tight, and that the limit of any weakly convergent subsequence is
10.1 Limit Theorems 269

a controlled jump diffusion process. Under a uniqueness condition, it will


follow from this that a relaxed control can be approximated by an ordinary
control which is piecewise constant. This latter result was first shown in [55]
and can also be found in [108]. We will then use this result to show that,
under a uniqueness condition, a relaxed control can be approximated by an
ordinary control which is piecewise constant and takes only finitely many
values. The approximation is in the sense of the topology of relaxed controls
as well as the closeness of the associated cost functions. These methods will
be used later to show convergence of the numerical approximations. The
weak convergence terminology will be that of Chapter 9.
The method used in Theorem 1.1 is similar to the method discussed in
Section 9.4. It uses a simple "continuity argument" to simplify the more
traditional proofs using the so-called martingale formulation. It is more
direct than the latter method for our type of problem. Similar methods
will be used when proving the convergence Vh(x) ~ V(x) in Sections 10.4
to 10.6.
We will use the controlled jump diffusion model of Section 1.5

x(t) = x +1t L b(x(s), a)m 8 (da)ds + 1t cr(x(s))dw(s)

+ 1t l
(1.1)
q(x(s-),p)N(dsdp).

When writing (1.1) or referring to a solution of (1.1), it is always im-


plied that there exists a probability space, a filtration 7-"t, and processes
(x(·), m(·), w(·), N(·)) such that w(·) is a standard .rt-Wiener process, N(·)
is an 7-"t-Poisson measure with jump rate ,\ and jump distribution II(·),
m(·) is admissible with respect to (w(·),N(·)) [i.e., m(·) is 7-"t-adapted],
and the solution x(·) is 7-"t-adapted. Such a Wiener process and Poisson
measure will always be referred to as standard. Existence is always in the
weak sense, because the probability space and the w(·),N(·) are not spec-
ified a priori.
Recall the following definition.

Definition. By weak sense uniqueness for an initial condition x, we mean


that the probability law of an admissible triple determines the probabil-
ity law of any solution (x(·),m(·),w(·),N(·)) to {1.1) irrespective of the
probability space.
We will use the assumptions:

Al.l. b(·) and cr(·) are bounded and continuous.

A1.2. q(·) is measumble and bounded, q(·,p) is continuous for each p.

A1.3. k(·),c(·) and g(·) are bounded and continuous.


270 10. Convergence Proofs

A1.4. Let u(·) be an admissible ordinary control with respect to (w(·), N(·)),
and suppose that u( ·) is piecewise constant and takes only a finite number
of values. Then, for each initial condition, there exists a solution to (1.1),
where m( ·) is the relaxed control representation of u( ·), and this solution is
unique in the weak sense.

Remark. (A1.4) is equivalent to the assumption of weak sense existence


and uniqueness for each initial condition and constant control.

It is worth noting that the continuity conditions in (A1.1)-(A1.3) can be


weakened and it is often important to do so. We will comment on this at
the end of Theorem 1.1 and in Section 5. Recall the definition N(t,A) =
N([O, t] x A), the number of jumps with values in A by time t. Write
N(t) = N(t, ·)for simplicity. For future use, note that for each t, N(t) can
be constructed from the two processes

N(s, r), N(s) =lost pN(drdp), s::; t. (1.2)

The first member is just a Poisson process with rate .A, and it identifies the
jump times. The second member can be used to get the jump values.

Notation in Theorem 1.1. In the next theorem, we assume that for


each integer n there is a probability space on which are defined a filtration
:FJ", a pair of processes (wn(-), Nn(·)), an admissible relaxed control mn(-),
and a solution process xn(-). The wn(-), Nn(-) are a standard J=t- Wiener
process and an J=t- Poisson measure, respectively. The filtration satisfies
;=;' ::J F(xn(s), mn(s), wn(s), Nn(s), s::; t).
Thus, xn(-) satisfies

+lot l b(xn(s), o:)m~(do:)ds


xn(t) = xn(O)
(1.3)
+lot u(xn(s))dwn(s) +lot [ q(xn(s-),p)Nn(dsdp).
Remark on the Notation in the Theorem Statement. Note that we
let the probability space vary with the control. The basic reason for this
is that we are working with weak sense solutions and cannot always define
the solution process as an explicit function of the "driving forces," which
are the control, the Wiener process, and the Poisson measure. First, con-
sider the most classical case, where the functions b(·,o:) and u(·) satisfy a
Lipschitz condition [which is assumed to be uniform in o: for b( ·, o:) ]. Then,
given an admissible triple (m( ·), w(-), N ( ·)), one can construct a strong so-
lution x( ·) on the same probability space and as an explicit function of the
10.1 Limit Theorems 271

triple. However, we want to work with the largest class of controls possible,
provided that they are admissible. It is not a priori obvious that the optimal
control will in all cases be representable as a function of only (w(·), N(·)).
Thus, we might at least have to augment the probability space, in a way
that depends on the control. It should be understood that the underlying
Wiener process and Poisson measure are fictions to a large extent. They are
very useful to represent and study the processes, but when calculating cost
functions and their limits, only the distributions of the processes are im-
portant, and not the actual probability spaces or the representation of the
solution to the SDE. Under the Lipschitz condition, given the probability
law of (m(·),w(·),N(·)), and the initial condition x, the probability law of
(x(·),m(·),w(·),N(·)) is uniquely determined, and that probability law is
all that is important is computing the cost functions. Let (mn(-), w(·), N(·))
be a sequence of admissible triples, all defined on the same probability space
with the same standard Wiener process and Poisson measure. The sequence
(or some subsequence) will not generally converge to a limit with probabil-
ity one (or even in probability) on the original probability space, and weak
convergence methods might have to be used to get appropriate limits. But
then the "limit processes" will not, in general, be definable on the original
probability space either. Because we do not want to worry about the actual
probability space, we often let it vary with the control.
If the Lipschitz condition does not hold, but the Girsanov transformation
method is used to get the controlled process from an uncontrolled process
via a transformation of the measure on the probability space, then the
Wiener process is not fixed a priori, and its construction depends on the
control. The considerations raised in the last paragraph also hold here.
These comments provide a partial explanation of the indexing of the Wiener
process and Poisson measure by n.

Theorem 1.1. Assume (Al.l) and (A1.2). Let xn(o) => x 0 and let vn be
an F["-stopping time. Then any sequence {xn(-),mn(·),wn(·),Nn(-),vn}
is tight. Let (x(· ), m(-), w(·), N(·), v) denote the limit of a weakly convergent
subsequence. Define

Ft = F(x(s), m(s), w(s), N(s), vl{v~t}• s ~ t).

Then w(·) and N(·) are a standard Ft- Wiener process and Ft-Poisson
measure, respectively, v is an Ft-stopping time, m(·) is admissible with
respect to (w(·),N(·)), x(O) = xo, and x(·) satisfies (1.1).

Proof. Tightness. The criterion of Theorem 9.2.1 will be used. LetT< oo,
and let Dn be an arbitrary F["-stopping time satisfying Dn ~ T. Then, by
the properties of the stochastic integral and the boundedness of q( ·) and
the jump rate A,
272 10. Convergence Proofs

where the order 0(8) is uniform in Vn· Thus, by Theorem 9.2.1, the sequence
{xn(·)} is tight. The sequences of controls {mn(·)} and stopping times {vn}
are tight because their range spaces ['R.(U x [0, oo)) and [0, oo], respectively]
are compact. Clearly {wn(-),Nn(·)} is tight and any weak limit has the
same law as each of the (wn (-), Nn (·)) pairs has.
Chamcterization of the limit processes. Now that we have tightness, we
can extract a weakly convergent subsequence and characterize its limit.
For notational convenience, let the original sequence converge weakly, and
denote the limit by (x(·},m(·},w(·},N(·),v). Because the processes wn(-)
have continuous paths with probability one, so will w(·). It follows from
the weak convergence that m(t,U} = t for all t. We want to show that
the limit x( ·) is a solution to a stochastic differential equation with driving
processes (m( ·), w( ·), N (·)). This will be done by a combination of a fairly
direct method and a use of the martingale method.
Let 8 > 0 and let k be a positive integer. For any process z(·) with paths
in Dk [0, oo), define the piecewise constant process Z6 (·) by

Z6(t) = z(ib}, t E [ib, ib +b).

By the tightness of {xn(·)} and the boundedness and continuity in (Al.l},


(A1.2), we can write (1.3) as

xn(t) = xn(o) + 1t Lb(xn(s),a)m~(da}ds


+ 1t 1t £
a(x6(s))dwn(s) + q(x6(s- ), p)Nn(dsdp) + e6,t
(1.4}
where
Ele6,t I ~ 0
uniformly in n and in t in any bounded interval.
For the rest of the proof, we assume that the probability spaces are chosen
as required by the Skorokhod representation (Theorem 9.1.7}, so that we
can suppose that the convergence of {xn(·),mn(·},wn(·),Nn(·},vn} to its
limit is with probability one in the topology of the path spaces of the
processes. Thus,

uniformly on any bounded time interval with probability one. The sequence
{mn(·)} converges in the "compact-weak" topology. In particular, for any
continuous and bounded function rjJ( ·) with compact support,
10.1 Limit Theorems 273

Now the Skorokhod representation and weak convergence imply that

lot Lb(x(s),a)m:(da)ds -lot Lb(x(s),a)m (da)ds ~ 8 0

uniformly in t on any bounded interval with probability one.


Owing to the fact that the x~ ( ·) are constant on the intervals [i5, i5 + 5),
the third and fourth terms on the right side of {1.4) converge to, respec-
tively,
lot u(x6(s))dw(s),
loth q(x6(s-), p)N(dsdp),
which are well defined with probability one since they can be written as
finite sums. Thus, we can write

x(t) = xo +lot Lb(x(s),a)m (da)ds 8

+lot u(x6(s))dw(s) +loth q(x6(s-),p)N(dsdp) +ecS,t.


{1.5)
where EJecS,tl ~ 0, uniformly on bounded t-intervals.
We next characterize w(·). We know that it is a Wiener process, but we
need to show that it is an Ft- Wiener process, where Ft is defined in the
theorem statement. Let H(·) be a real-valued and continuous function of
its arguments, and with compact support. Let¢(·) and¢;(·) be real-valued
and continuous functions of their arguments and with compact support.
Define the function 1

(¢, m)t =lot L ¢(a, s)m(dads).

Let p, t, u, ti, i:::; p, be given such that ti :::; t :::; t + u, i:::; p and P{v =
ti} = 0. For q = 1, 2, ... , let {rj,j :::; q} be a sequence of nondecreasing
partitions of r such that II(orj) = 0 for all j and all q, where orj is the
boundary of the set rj. As q -+ oo, let the diameters of the sets rj go to
zero. Let the Wiener processes be JR!' -valued, and let/(-) E C3(1Rr').

1 The function (rp, m)t of m(·) is introduced, since any continuous function of
m(·) can be arbitrarily well approximated by continuous functions of the type

h((,P;,m)t;,j:::; q,i:::; p),

for appropriate {ti} and continuous h(·) and ,P;(·) with compact support.
274 10. Convergence Proofs

Define the differential operator Cw of the Wiener process


1 r'
Cwf(w) = 2 Lfw;w;(w).
i=l

By the fact that wn (·) is a Ff- Wiener process, we have


EH(xn(ti), wn(ti), (¢j, mn)t;, Nn(ti, rj),j ~ q, i ~ p, Vn l{vn:St})
(1.6)
X [f(wn(t + u))- f(wn(t)) -1t+u Cwf(wn(s))ds] = 0.
By the probability one convergence which is implied by the Skorokhod
representation

E lit+u Cwf(wn(s))ds -1t+v. Cwf(w(s))dsl-t 0.

Using this result and taking limits in (1.6) yields


EH(x(ti), w(ti), (¢j, m)t., N(ti, rj),j ~ q, i ~ p, vl{v:St})
(1.7)
x [f(w(t + r))- f(w(t)) -1t+T Cwf(w(s))ds] = 0.

The set of random variables


H(x(ti), w(ti), (¢j, m)t., N(ti, rj), j ~ q, i ~ p, vl{v:St} ),
asH(·), p, q, ¢j(· ), q, and ti vary over all possibilities, induces the a-alge-
bra Ft. Thus, (1.7) implies that

f(w(t)) -lot Cwf(w(s))ds

is an Ft-martingale for all f(·) of the chosen class. Thus, w(·) is a standard
Ft- Wiener process.
We now turn our attention to showing that N (·) is an Ft- Poisson mea-
sure. Let 8(-) be a continuous function on r, and define the process

8N(t) =lot l O(p)N(dsdp).

By an argument which is similar to that used for the Wiener process above,
iff(-) is a continuous function with compact support, then
EH(x(ti),w(ti), (¢j, m)t.,N(ti,rj),j ~ q, i ~ p, vl{v:St})

X [f(8N(t + u))- f(8N(t)) (1.8)

-A 1t+v.h[f(8N(s) + O(p))- f(8N(s))]II(dp)ds] = 0.


10.1 Limit Theorems 275

Equation {1.8) and the arbitrariness of H(·),p,q,ti,tPi(·),rJ,J(·), and 9(·)


imply that N(·) is an Ft-Poisson measure.
The facts that w(·) and N(·) are an Ft-Wiener process and Poisson
measure, respectively, implies their mutual independence. By construction,
{v ~ t} C Ft, which implies that vis aFt-Stopping time. Since for each t,
Ft measures {x(s), m(s), w(s), N(s), s ~ t}, the control m(·) is admissible
and x(·) is nonanticipative with respect to (w(·), N(·)). We can now take
limits as 8 ---t 0 in (1.5) to complete the proof of the theorem. •

Remark on Discontinuous Dynamical and Cost Terms. Theorem


9.1.5 allows us to weaken the conditions on b(·), k(·) and a(·). Suppose that
(xn(-), mn(-), wn(-), Nn(·)) converges weakly to (x(·), m(·), w(·), N(·)). Let
us use the Skorokhod representation so that we can assume probability one
convergence. Let a(·) be continuous.
Suppose that b(x, a) takes either of the forms bo(x) +b1 (a) or bo(x)bl(a)
or is a product or sum of such terms, where the control dependent terms are
continuous and the x-dependent terms are measurable. It is only necessary
that
lot bo(xn(s))ds lot bo(x(s))ds
---t

with probability one for each t. Let Db denote the set of points where boO
is discontinuous. This will hold if J~ bo(¢(s))ds is a continuous function
from Dr[o, T] to IRr for each T, with probability one with respect to the
measure induced by x( ·). A sufficient condition is that, for each t

lim

t P {x(s)
lo
E N€(Db)} ds = 0.

If the set of discontinuity is a uniformly smooth surface of lower dimension,


then nondegeneracy of a(x) for each x is sufficient.
For the control problems below, there are analogous weakenings of the
conditions on the cost functions k(·) and g(·). Similar considerations ap-
ply to the convergence of the numerical approximations. See, for example,
Theorem 5.3.

10.1. 2 An approximation theorem for relaxed controls


Suppose that (x(·),m(·),w(·),N(·)) solves (1.1), where the relaxed control
m(·) is admissible with respect to (w(·),N(·)). In some of the technical
arguments in the sequel it is important to be able to approximate a re-
laxed control by a simple ordinary control. Under a uniqueness condition,
this can be done with the approximation arguments of Theorem 1.1, with
appropriate choices of the mn(-).

The proof of the following theorem is in [95, Theorem 3.5.2]. In the


theorem, the processes indexed by f and the limit process are assumed to
276 10. Convergence Proofs

be defined on the same probability space, but this can always be done via
the Skorokhod representation. The value of the cost function depends on
the joint distribution of (x(·},m(·)). In order to simplify the notation, we
write the cost function only as a function of m( ·) and the initial condition
x. The brevity of the notation should not cause any confusion.

Theorem 1.2. Assume (A1.1} - (A1.4} and let x(O} = x. For a given
admissible triple (m(·}, w(·),N(·)}, let the solution x(·) to (1.1} exist and
be unique in the weak sense. For any finite T and {3 > 0, define the cost
functions

WT(x, m) = E;a 1T L k(x(s), a:)m(da:ds) + E;ag(x(T}}, (1.9}

W(x,m) = E;a 1L 00
e-/Jsk(x(s},a:)m(da:ds). (1.10}

Given e > 0, there is a finite set {a:l, ... , a:~.} = U€ c U, and a 8 > 0
with the following properties. There is a probability space on which are
defined processes
(x€( ·}, u€( ·}, w€(· }, N€(·) }, (1.11}
where wE(-) and NE(-} are our standard Wiener process and Poisson mea-
sure, respectively, and uE(·) is an admissible UE-valued ordinary control
which is constant on the intervals [i8, i8 + 6). Furthermore, the processes
(1.11) satisfy (1.1) and

P:;" {sup lxE(t)- x(t)l >


t~T
f} ~ f
IW(x, m)- W(x, uE)I ~ e, (1.12}

10.2 Existence of an Optimal Control


By Theorem 1.1, there exists an optimal control for the cost functions
{1.9} and {1.10} because Theorem 1.1 shows that the limit of a weakly
convergent minimizing sequence of controlled processes is also a controlled
process, and the sequence of costs converges to that of the limit process.
When the controlled process is stopped at the time of first exit from some
set or first entrance into some target or absorbing set, an additional compli-
cation arises, because the first exit times of the sequence of approximating
processes might not converge to the first exit time of the limit process.
Similar questions will arise when dealing with the convergence of the costs
10.2 Existence of an Optimal Control 277

for the numerical algorithm. This problem was also discussed in Chapter
9. In this section, we will work with the cost function

where /3 ~ 0 and r is the first escape time from the set C0 , the interior of
the set C satisfying (A2.1) below. The discussion concerning continuity of
the escape time below is applicable to general cost functions and not just
to the discounted cost function.
Using the notation of Theorem 1.1, let {xn(·), mn(-), wn(·), Nn(·), Tn} be
a minimizing sequence, that is,

(2.2)

and rn = inf{t: xn(t) rf. C0 }. By the results of Theorem 1.1 we know that
{xn (-), m n (-), wn (-), Nn (-), Tn} has a weakly convergent subsequence. For
notational simplicity, suppose that the original sequence itself converges
weakly and that (x(·),m(·), w(·),N(·),f) denotes the limit. Define the fil-
tration
:Ft = :F(x(s), m(s), w(s), N(s), s ~ t, fl{T~t})·
Then by Theorem 1.1 w( ·) and N (·) are a standard :Ft- Wiener process and
Poisson measure, respectively. Also, m(·) is admissible, (1.1) holds and f is
an :Ft -stopping time. If either f3 > 0 or the {rn} are uniformly integrable,
then under the continuity and boundedness of k(·) and g(·), it is always
the case that

E';'"' lorn fu e-f3sk(xn(s), a)m~(da)ds


---+ E';' 1f fu e- k(x(s),a)m (da)ds,
138
(2.3)
8

E';'n e-f3rn g(xn(rn))---+ E';'e-{3f g(x(f)).

Iff = r, the time of first exit of the limit x( ·) from C 0 , then we would have

V(x) = W(x, m), (2.4)

and the existence of an optimal control for the cost function (2.1) would
be proved.

Continuity of the First Exit Time. Unfortunately, it is not always


the case that the limit f is the first exit time of x( ·). In order to better
understand the problem, refer to the deterministic case illustrated in Figure
10.1.
278 10. Convergence Proofs

Figure 10.1. Continuity of first exit times.

In the figure, the sequence of functions <Pn (·) converges to the limit func-
tion <Po (·), but the sequence of first contact times of <Pn (·) converges to a
time which is not the moment of first contact of <Po (·) with the boundary
line aG. From the illustration, we can see that the problem .in this case is
that the limit function is tangent to aG at the time of first contact.
For our control problem, if the values W(x, mn) are to converge to the
value W(x, m), then we need to assure (at least with probability one) that
the paths of the limit x( ·) are not "tangent" to aG at the moment of first
exit from G0 • Let us now define our requirement more precisely. For <P( ·)
in Dr[o, oo) (with the Skorokhod topology used), define the function f(</J)
with values in the compactified infinite interval JR.+= (O,oo] by: f(</J) = oo,
if </J( t) E G 0 for all t < oo, and otherwise
f(</J) = inf{t: </J(t) ¢CO}. (2.5)
In the example of Figure 10.1, f(·) is not continuous at the path <Po(·).
If the path </JoO which is drawn in the figure were actually a sample path
of a Wiener process, then the probability is zero that it would be ''tangent"
to aG at the point of first contact. This is a consequence of the law of the
iterated logarithm for the Wiener process or, more intuitively, because of
the "local wild nature" of the Wiener process. It would cross the boundary
infinitely often in any small interval about its first point of contact with
a smooth boundary. The situation would be similar if the Wiener process
were replaced by the solution to a stochastic differential equation with a
uniformly positive definite covariance matrix a(x). This was illustrated in
Section 9.4, where it was shown that this "tangency" could not happen due
to the law of the iterated logarithm. If such a process were the limit of the
{xn(·)} introduced above and the boundary aG were "smooth" (see the
remarks below), then we would have Tn ---t r, where T is the first hitting
time of the boundary. If, in addition, {3 > 0 or {Tn} is uniformly integrable,
then
W(x, mn) ---t W(x, m). (2.6)
10.2 Existence of an Optimal Control 279

If the original sequence were minimizing, then W(x, m) = V(x). The same
"boundary" considerations arise when proving the convergence of the nu-
merical approximations Vh(x) to V(x). We will next give conditions which
will guarantee the convergence (2.6).

Convergence of the First Exit Time. The following conditions on the


set G will be used:

A2.1. The set G is compact and is the closure of its interior G0 •

A2.2. The function f(·) is continuous (as a map from Dr[o,oo) to the
compactified interval [0, oo]) with probability one relative to the measure
induced by any solution to (1.1) for the initial condition x of interest.

For the purposes of the convergence theorems for the numerical approx-
imations starting in Section 10.3, (A2.2) can be weakened as follows:

A2.2'. For each eo > 0, and initial condition x of interest, there is an


eo-optimal process (x(·),m(·),w(·),N(·)) satisfying (1.1), which is unique
in the weak sense, and such that f( ·) is continuous with probability 2:: 1 -eo
with respect to the measure of the solution x( ·).

By eo-optimality of m(·), we mean that W(x, m) ~ V(x) +eo.

Remark on (A2.2). (A2.2) and (A2.2') are stated as they are because
little is usually known about the e-optimal processes. Such conditions are
satisfied in many applications. The main purpose is the avoidance of the
"tangency" problem discussed above. The tangency problem would appear
to be a difficulty with all numerical methods, since they all depend on
some sort of approximation, and implicitly or explicitly one seems to need
some "robustness" of the boundary conditions in order to get the desired
convergence. In particular, the convergence theorems for the classical fi-
nite difference methods for elliptic and parabolic equations generally use a
nondegeneracy condition on a(x) in order to (implicitly) guarantee (A2.2).
The nature of the dynamical equation (1.1) often implies the continuity
off(·) with probability one, owing essentially to the local ''wildness" of the
Wiener process. Let us consider a classical case. Let a( x) be uniformly pos-
itive definite in G, and let 8G satisfy the following "open cone condition" :
There are fo > 0, f > 0, and open (exterior) cones C(y) of radius eo at unit
distance from the origin such that for each y E 8G, we have
{x: X- y E C(y), iY- xi< t:} n ao = 0.
Then by [49, Theorem 13.8], (A2.2) holds.
The checking of (A2.2) for the degenerate case is more complicated, and
one usually needs to take the particular structure of the individual case into
280 10. Convergence Proofs

account. An important class of degenerate examples is illustrated in [90,


pp. 64-66]. The boundary can often be divided into several pieces, where
we are able to treat each piece separately. For example, there might be a
segment where a "directional nondegeneracy" of the diffusion term a(x) =
u(x)u'(x) guarantees the almost sure continuity of the exit times of the
paths which exit on that segment, a segment where the direction of the drift
gives a similar guarantee, a segment on which escape is not possible, and
the complement of the above parts. Frequently, the last "complementary"
set is a finite set of points or a curve of lower dimension that the boundary.
Special considerations concerning these points can often resolve the issue
there. In the two dimensional example cited above from [90], G is the
symmetric square box centered about the origin and the system is (x =
(xt. x2))
dx1 = x2dt,
dx2 = udt + dw,

where the control u( ·) is bounded. The above cited "complementary set" is


just the two points which are the intersections of the horizontal axis with
the boundary, and these points can be taken care of by a test such as that
in Theorem 6.1 of [143].
The boundaries in control problems are often somewhat "flexible." In
many cases, they are introduced simply in order to bound the state space.
The original control problem might be defined in an unbounded space, but
the space truncated for numerical reasons. Even if there is a given ''target,"
it is often not necessary to fix the target set too precisely. These consid-
erations give us the freedom to vary the boundary slightly. This freedom
suggests the "randomized stopping" alternative discussed in the paragraph
after the next.

The Girsanov Measure Transformation Technique and Continuity


of the Stopping Times. Define the set A = { ¢( ·) : f( ·) is continuous at
¢(·)}.Then A is an open (hence measurable) set in Dr[o,oo) in the Sko-
rokhod topology. Suppose that for some jump diffusion process x( ·) satis-
fying (1.1), the conditions required in Section 1.3 for the Girsanov mea-
sure transformation to be usable for modifying the drift are satisfied. Let
P denote the measure which induces x(·) on vr[O,oo) and suppose that
P{A} = 1. Now let us modify the control via the measure transformation
and obtain a new process satisfying (1.1) for that control, with associated
measure P satisfying P << P. Then P{A} = 1. Thus, if (A2.2) holds for
one control it holds for all controls.

An Alternative to {A2.2). If the set G can be altered even very slightly,


then there is a satisfactory alternative stopping rule which accomplishes
the same purpose as (A2.2). This rule is called randomized stopping. Under
randomized stopping, the probability of stopping at some time t (if the
10.2 Existence of an Optimal Control 281

process has not yet stopped) goes to unity as the state value x(t) at that
time approaches 8G. This will now be formalized.

Randomized Stopping. For some small e > 0, let X(-)> 0 be a continu-


ous function on the set NE(8G) nCO, where NE(8G) is the €-neighborhood
of the boundary. Let X(x) ~ oo as x converges to 8G. Then we will stop the
controlled process x(·) at timet with stopping rate X(x(t)) and stopping
cost g(x(t)).
As far as the costs are concerned, randomized stopping is equivalent
to adding an additional and state dependent discount factor. The cost is
replaced by

W(x,m) = E:-1-r Lexp [-,Bs -1 X(x(u))du] k(x(s),a)m (da)ds


8
8

+ E:-1-r exp [ -,Bs -1 X(x(u))du] X(x(s))g(x(s))ds.


8

{2.7)
The randomized stopping rule can be applied to the approximating Mar-
kov chain {e~, n < oo} in the same way: Stop the chain at step n with
probability 1- exp(-[X(e~)~t~]). fore~ E 00, and with probability one if
e~ f/. G0 . The stopping cost is g(e~). The computational problem is altered
only slightly. For example, if ~th(x, a) does not depend on a, then the
dynamic programming equation (5.8.3) becomes
Vh(x) = {1- e-X(x)6.th(x))g(x)
(2.8)
+ e-X(x)6.th(x) x right side of (5.8.3),

for x E Gh, and with the boundary condition Vh(x) = g(x) for x f/. Gh.
The Convergence Theorem. Theorem 1.1 and the above discussion yield
the following result:

Theorem 2.1. Assume {A1.1)- {A1.3), {A2.1), and that the cost {2.1) is
used. Let either ,B > 0, or {rn} be uniformly integrable. Let
{xn(-), mn(-), wn(-), Nn(·), rn}
be a sequence of solutions to {1.1) and associated stopping times which
converges weakly to (x(·), m(·), w(·), N(·), f). Let {A2.2) hold for the limit.
Then f = r with probability one and (2.6) holds. If the sequence is mini-
mizing, then {2.4) holds. Let k(x) ~ 0 and g(x) = 0, and let the sequence
be minimizing. Then
liminfW{x,mn) ~ W(x,m)
n
and {2.4) holds without {A2.2), where W(x, m) is the cost for the limit
process whether or not the solution to {1.1) is unique.
282 10. Convergence Proofs

Assume the randomized stopping in lieu of(A2.2), and replace the {Tn}
above with the (no larger) randomized stopping times. Then the assertions
in the first paragraph remain true.

10.3 Approximating the Optimal Control


By Theorem 1.2, a relaxed control can be approximated by a simple ordi-
nary control. For the proof of convergence of the numerical approximations
Vh(x) to the desired limit V(x) in Sections 10.4 to 10.6 below, we will
need an approximating control with a "finite dependence" on the "past" of
the Wiener process and Poisson measure and which has certain continuity
properties. These continuity properties will allow the use of weak conver-
gence methods when applying these "comparison" controls to the Markov
chains. The existence of such a control will be proved in Theorem 3.1 below.

Remark on Notation. In the theorem below, we use various standard


Wiener processes and Poisson measures, which will be denoted by w"~(·) and
N"~ (·), respectively, for appropriate scalar or vector values of the superscript
I· When a control m"~(·) has the same superscript, it is assumed to be
admissible with respect to the first pair. Also, x"~(·) and T"~ will denote
the associated solution process and first exit time from C 0 , respectively.
The initial condition of all x"~ (·) and x( ·) is x. For any ordinary stochastic
control u"~(-) we use m"~(·) for its relaxed control representation.

Theorem 3.1. Assume (ALl) - (A1.4), (A2.1) and (A2.2') and use the
cost function (2.1) with f3 > 0. Fix Eo > 0, and let (x( ·), m( ·), w( ·), N( ·)) be
an Eo-optimal solution whose existence is asserted in (A2.2'). LetT denote
the first escape time from C 0 • Then, for each E > 0, there is a 8 > 0 and a
probability space on which are defined a pair (w£(·), N£(·)), a control u'(·)
of the type introduced in Theorem 1. 2, and a solution x£ ( ·) such that

IW(x, m')- W(x, m)l ~E. (3.1)

There is(}> 0 and a partition {rJ,j ~ q} ofr such that the approximating
u£ (·) can be chosen so that its probability law at any time n8, conditioned on
{w' (s), N£ (s), s ~ n8, u' (i8), i < n}, depends only on the initial condition
x = x(O) and on samples

and is continuous in the x, w' (p(}) arguments for each value of the other ar-
guments. If the set of stopping times over any set of controls with uniformly
bounded costs is uniformly integrable, then the above conclusions continue
to hold for {3 = 0.
10.3 Approximating the Optimal Control 283

If randomized stopping is used in lieu of the continuity of the exit times


in (A2.2'), then the above assertions still hold.

Comment on the Proof. Only the first paragraph will be proved. In-
equality (3.1) and the statements above it are essentially Theorem 1.2. The
only modifications concern the presence of a stopping boundary. The asser-
tions (3.1) and above yield an approximating control u'(·) which takes only
finitely many values and is constant on the time intervals [n8, n8 + 8). To
get the form of the control whose existence is asserted below (3.1), we start
with the control u'(·) constructed above and modify it in several stages.
The desired control will be defined via its conditional probability law, given
the past values of the control and the driving Wiener process and Poisson
measure. First it is shown, via use of the martingale convergence theorem,
that the conditional probability law can be arbitrarily well approximated
by a conditional probability law that depends on only a finite number of
samples of the driving processes. In order to get the asserted continuity in
the samples of the Wiener process, a mollifier is applied to the conditional
probability law obtained in the step above.

Proof. Part 1. Let E > 0. By Theorems 1.1 and 1.2, there are 8 > 0, a finite
set U, C U and a probability space on which are defined a solution to (1.1),
namely, (x'(·),u'(·),w'(·),N'(·)), where u'(·) is U,-valued and constant
on the intervals [n8, n8 + 8). Also, (x'(·), m'(·), w'(·), N'(·)) approximates
(x(·),m(·), w(·),N(·)) in the sense that as E-+ 0,

(x'( ·), m'( ·), w'(· ), N'( ·)) =? (x( ·), m( ·), w(-), N(·)). (3.3)

Let r' = inf{t : x' (t) (/. G 0 }. There is a stopping time f such that

(x'(·),m'(·),w'(·),N'(·),r') =? (x(·),m(·),w(·),N(·),f). (3.4)

By assumption (A2.2'), there is t: 1 > 0 which can be supposed to be


arbitrarily small, such that f( ·) is continuous with probability greater than
1-EI with respect to the measure of the x(·) process. Let a> 0. For various
values of "f to be used below, we will consider processes x'(·) that converge
weakly to x( ·). The Skorokhod representation will be used, so that the
convergence is with probability one. Let v~ denote the probability of the
set of paths of the x 1 (·) process for which lf(x1 ) - rl :2: a. By the previous
assertions in this paragraph,

lim sup
f
v: :=:; E1. (3.5)

Since a > 0 is arbitrary, this, (3.4), and the discounting imply that

lim sup IW(x, m')- W(x, m)l :=:; 8,1' (3.6)



284 10. Convergence Proofs

where c5, 1 -t 0 as fl -t 0.
Part 2. In order to get the representation asserted below (3.1), we will
start with au' (·) of the type just described in Part 1 for small f, and modify
it slightly to get the desired continuity and "finite dependence" properties.
This will be the u•0P( ·) defined below. Let 0 < (} < c5. Let the {fJ, j ~ q} be
partitions of r of the type used in the proof of Theorem 1.1, and let q -t oo
as (} -t 0. For a E U., define the function Fno as the regular conditional
probability

Fno (a;x,u'(ic5),i < n, w'(pO),N'(pO,fJ),j ~ q,pO < nc5)

= P{u'(nc5) = ajx,u'(ic5),i < n, w'(pO),N'(pO,fJ),j ~ q,pO < nc5}.


(3.7)

By the uniqueness assumption (A1.4), the probability law of (x'(·), m'(·),


w'(·), N'(·)) is determined by the initial condition x = x(O) and the prob-
ability law of (m'(·),w'(·),N'(·)). Because the a-algebra determined by
the set {u'(ic5),i < n, w'(pO),N'(pO,fJ),j ~ q,pO < nc5} increases to the
a-algebra determined by {u'(ic5),i < n, w'(s),N'(s), s ~ nc5} as(} -t 0,
the martingale convergence theorem implies that for each n, a, and c5,

Fno (a;x,u'(ic5),i < n, w'(pO),N'(pO,fJ), j ~ q,pO < nc5)

-t P {u'(n8) = ajx, u'(i8), i < n, w'(s), N'(s), s ~ nc5}


with probability one, as () -t 0.
For given (w'6 ( ·), N' 6 ( ·)), define the control u'6 ( ·) by the conditional
probability

= Fno (a; x, u'0 (i8), i < n, w'6 (p0), N' 6 (p0, fJ),j ~ q,pO < n8).
(3.8)
By the construction of the control law, as (} -t 0

The solution to (1.1) exists (on some probability space) and is (weak sense)
unique when the control is piecewise constant and takes only a finite number
of values. Using this, we get

(x' 6 ( •), m'6 ( •), w'6 ( •), N' 6 ( ·)) =* (x'( ·), m'( ·), w'( ·), N'( ·))

as() -t 0. Also, by the weak convergence x'6(-) =* x(·) as (E,O) -t (0,0),


and
limsupv~6 ~ f1.
,,(}
10.3 Approximating the Optimal Control 285

Hence, we can conclude that


lim sup IW(x, m'8 ) - W(x, m)l ~ c5, 1 • {3.9)
E 18

For p > 0, define the "mollified" functions Fnep( ·) by


Fnsp (a:; x, u(ic5), i < n, w(p(J), N(p(J, fJ),j ~ q, p(J < nc5)

=N(p)I · jFns(o:;x,u(ic5),i < n,w(p(J) + Zp,N(p(J,fJ),j ~ q, p(J < nc5)


X IJ e-lzpl2 /2pdZp,
p

where N(p) is a normalizing constant such that the integral of the mollifier
is unity. The Fnsp are nonnegative, their values sum (over a: E U,) to unity,
and they are continuous in the w-variables for each value of the other
variables. Also, they converge to the unmollified function with probability
one as p --t 0. The last assertion and the continuity are consequences of
the fact that the probability distribution of a normally distributed random
variable is absolutely continuous with respect to Lebesgue measure.
Let u'8P( ·) be the piecewise constant admissible control which is deter-
mined by the conditional probability distribution Fnsp(·) : In particular,
there is a probability space on which we can define (w' 8 P(·),N• 8P(·)) and
the control law u'8P(·) by the conditional probability
P {u'8P(nc5) = o:!x, u'8 P(ic5), i < n, w'8 P(s), N' 9 P(s), s ~ nc5}

= Fnsp (o:;x,u' 8P(ic5),i < n,w'8 P(p(J),N'8 P(p(J,fJ),j ~ q, p(J < nc5).
Then, by the construction of the probability law of the controls,
(x'oP( ·), m'oP( ·), w'oP(. ), N'oP( ·)) :::} (x'o(. ), m'o(. ), w'o(·), N'o( ·))
as p --t 0. We therefore have
lim sup v,sp ~ ft
<,9,p

and
lim sup IW(x, m'8 P)- W(x, m)l ~ c5w (3.10)
E 18,p

Putting the above arguments together, and noting that € 1 can be cho-
sen arbitrarily small, yields that for each € > 0 there are c5 > 0, () >
O,q,w'(·),N'(·), and an admissible control law which is piecewise constant
(on the intervals [nc5, nc5 + c5)) with values in a finite set U, c U, and deter-
mined by the conditional probability law
P {u'(nc5) = o:!x, u'(ic5), i < n, w'(s), N'(s), s ~ nc5}

= Fn(o:;x,u'(ic5),i < n, w'(p(J),N'(p(J,fJ),j::::; q,p(J < nc5),


(3.11)
286 10. Convergence Proofs

where the Fn(-) are continuous with probability one in the w-variables,
for each value of the other variables, and for which (3.1) holds. Owing to
the weak sense uniqueness (A2.2'), without loss of generality we can apply
a mollifier to the x-dependence and suppose that there is continuity in the
(x, w )-variables.
Under the uniform integrability condition, we can restrict our attention
to a finite time interval and the same proof works. •

10.4 The Approximating Markov Chain: Weak


Convergence
Let us recall the basic problem to be dealt with. Given an approximation
parameter h, the basic computational model is a controlled discrete pa-
rameter Markov chain {~~, n < oo} which is "locally consistent" with the
controlled process (1.1) in the sense used in (4.1.3) (for the process with-
out jumps) and in Subsection 5.6.2 for the general jump diffusion process.
Here we are concerned with the cost function with an absorbing bound-
ary or a target set. The problem with reflecting boundaries will be dealt
with in Chapter 11. A cost function (2.1) for {1.1) is given, with minimum
value V(x). Some appropriate cost function for the chain such as {4.3.6)
is given, and its optimal value Vh(x) computed. We wish to prove that
Vh(x) --t V(x) as h --t 0.
The optimal value function for the discrete parameter chain is also an op-
timal value function for the controlled continuous parameter Markov chain
model '1/Jh(·) whose properties are defined in Chapters 4 and 5. (If {3 > 0,
then there might be a small difference in the value functions for the two
models, depending on how the discounting is approximated, but the differ-
ence goes to zero as h --t 0, as seen in Chapter 4.) The optimal control for
the chain {~~,n < oo} is a feedback control, which we denote by uh(x). In
the proofs to follow, the controls which will be used on these approximating
processes will always be ordinary stochastic controls, whether they are of
the pure Markov or feedback form or not. The relaxed control terminology
is not needed if we are only interested in the behavior of a particular con-
trolled chain. However, as h --t 0 the sequence of ordinary controls might
not converge in any traditional sense, and the use of the relaxed control
concept enables us to obtain and appropriately characterize limits. We have
seen an example of this in Sections 4.5 and 4.6.
The sole reason for using the approximating chain is to get an approxima-
tion to the optimal value function V(x). Under appropriate regularity con-
ditions and also nondegeneracy of a( x), the convergence theorems of numer-
ical analysis can often be used to get the convergence Vh(x) --t V(x), when
the approximations are based on (for example) finite difference schemes.
But, owing to the common presence of degeneracy and to other "nonreg-
10.4 The Approximating Markov Chain 287

ularities," this cannot be done as often as one would like. The methods
of proof to be employed are purely probabilistic. They rely on the fact
that the sequence of "approximating" processes which are defined by the
optimal solutions to the Markov chain control problem can be shown to
converge to an optimal process of the original form (1.1). The most useful
methods of proof are those of the theory of weak convergence of probability
measures, which provide the tools needed to characterize the limits of the
sequence of approximating processes. An example was given in Chapter 9.
In this section, we will prove the convergence of the continuous parameter
interpolations 1/Jh(-) to controlled processes of the form of (1.1).

10.4.1 Approximations and representations for '1/Jh( ·)


Let us recall the definitions of Section 5.6. The process {~~, n < oo} is an
approximating controlled Markov chain. The jump times of the interpola-
tion 1/Jh (·) are denoted by {r~, n < oo}. If { u~, n < oo} denotes an admis-
sible control sequence, then the interpolation uh(·) is defined by uh(t) = u~
on [r~,r~+ 1 ). Recall the representation (5.6.11)

where EI8Nt)1-+ 0, uniformly on any bounded time interval ash-+ 0 and


the processes Mh(-) and Jh(·) are defined below (5.6.11).
Define the relaxed control representation mh(-) of uh( ·) by its derivative

(4.2)

or, equivalently, mf({o:}) = 1 if uh(t) =a. Then we can write (4.1) in


relaxed control notation as

Recall that the quadratic variation of the martingale Mh(-) is

lot a('lj;h(s))ds + 8~(t),


where Esups<t l8q(t)1-+ 0. [See below (5.6.11).]
The discounted cost function (4.3.6) is

Wh(x,mh) = E';h forh Le-f3sk('lj;h(s),a)m~(da)ds (4.4)


+ E';h e-f3rhg('lj;(rh)),
288 10. Convergence Proofs

and Vh(x) denotes the minimum value. The dynamic programming equa-
tion is (4.3.7). Recall that we can approximate the discount factor by any
quantity dh(x, a) such that dh(x, a)/e-f3t:.th(x,o.)-+ 1, as discussed in Chap-
ter 4. In the next section it will be shown that Vh(x)-+ V(x).

Definition of an Auxiliary Process: An Approximation to the


Driving Wiener Process. In the next theorem, we will show that the se-
quence of processes {1/Jh (·)} is tight and that any subsequence has a further
subsequence which converges weakly to a controlled diffusion x( ·) of the
type (1.1), for some driving Wiener process, Poisson measure and admis-
sible relaxed control. For the purposes of proving the desired convergence
of the optimal value functions, it is useful to define a process which will
converge to the actual Wiener process "driving" the limit x( ·). The method
is analogous to what is used to get the driving Wiener process for a process
which solves the martingale problem. This approximation will be obtained
essentially by decomposing and "inverting" Mh (·).
Factor
a(x) = a(x)a'(x) = P(x)D 2 (x)P'(x),
where P(x) is an orthonormal matrix, D(x) is diagonal, and we can assume
that each is a measurable function of x. Define Ph(t) = P('l/Jh(t)), Dh(t) =
D('l/Jh(t)). We can factor a('l/Jh(t)) = a('l/Jh(t))a'('l/Jh(t)) as
a('l/Jh(t)) = Ph(t)D~(t)P~(t).
Denote the diagonal entries of Dh(t) by {dh,i(t), i :::; r }. Let t5o(h) --+ 0
denote the maximum step size of e~+l - e~ for the approximating chain
for the "diffusion" steps, those excluding the jumps which are due to the
approximation of the effects of the Poisson measure. Let 81 (h) > 0 be
such that it goes to zero and t50 (h)/t5 1 (h)-+ 0. Define the diagonal matrix
Dt(t) with entries dh",!(t)I{dh,;(t);::o 1 (h)}• and which are defined to equal
zero if dh,i(t) = 0. Let w(·) be a Wiener process which is independent of
{e~, u~, n < oo}. Define the process wh (-) by

The first term is just a finite (w.p.1) sum, and the second is a stochas-
tic integral. It can be easily verified that the defined process wh(-) is a
martingale. The processes defined by the two terms in (4.5) are orthogo-
nal martingales. The quadratic variation of wh(-) is just the sum of the
quadratic variations of the two components and is

1t Dt(s)P~(s)[Ph(s)D~(s)P~(s)]Ph(s)Dt(s)ds + Eh(t)
0
+ 1
t
(I- Dh(s)Dt(s))(I- Dh(s)Dt(s))ds =It+ Eh(t),
(4.6)
10.4 The Approximating Markov Chain 289

where I is the identity matrix and Eh (t) is an error which goes to zero as
h-+ 0, and is due to the error ah(x)- a(x) [see (4.1.3)].
The second term in (4.5) was constructed to compensate for the degen-
eracies in the first term; in particular, to assure that the quadratic variation
of wh (·) is close to that of a Wiener process. The first term on the right side
of (4.5) is linear between the jump times, and the jumps are bounded above
by 8o(h)/8 1(h) which goes to zero uniformly in all other variables ash-+ 0.
The truncation level 81 (h) was chosen to assure that the jumps in wh(-)
would go to zero as h -+ 0, so that any weak limit would have continuous
paths with probability one and, in fact, be a standard Wiener process. The
fact that any weak limit is a Wiener process is implied by the fact that a
continuous local martingale whose quadratic variation function is It must
be a Wiener process (Section 9.3). Using the "differential" notation, note
that [ignoring the negligible error Eh(t)]
o-(1/Jh(t))dwh(t) = Ph(t)Dh(t)dwh(t)
= Ph(t)Dh(t)[Dt(t)Ph(t)dMh(t) +(I- Dh(t)Dt(t))dw(t)]
= dMh(t) + [Ph(t)Dh(t)Dt(t)Ph(t)- I]dMh(t) + 0(81(h))dw(t).
Thus, we can write

Mh(t) = 1t o-(1/Jh(s))dwh(s) + s~(t),

where, for each t, Esup 8 ~t lsNs)l-+ 0 ash-+ 0. We can now write (4.1)
as

1/Jh(t) = x+ 1t l b(1/Jh(s),a)m~(da)ds+1t o-('1/!h(s))dwh(s)+Jh(t)+s~(t),


(4.7)
where for each t, limh--+0 Esups<t ls~(s)l = 0. Copying (5.6.12), write the
jump term Jh(·) as -

Jh(t) = L Qh(1/Jh(v~-),pn), (4.8)


n:vt;:~t

where the terms in (4.8) are defined in Section 5.6. An approximation Nh(·)
to a Poisson measure can be written in terms of {v~, Pn}. For a Borel set
H in r, define Nh(t, H) by

Nh(t,H) = L I{PnEH}• (4.9)


n:vt;:~t

Let :Fth denote the minimal o--algebra which measures


{1/Jh(s), m~(-), wh(s), Nh(s), s ~ t}.
We are now prepared for the convergence theorem.
290 10. Convergence Proofs

10.4.2 The convergence theorem for the interpolated chains


Theorem 4.1. Assume (A1.1)- (A1.2), and let the approximating chain
{ e~' n < 00} be locally consistent with (1.1). Let {u~' n < 00} denote the
admissible sequence of controls which is used. Let '1/Jh(·) denote the con-
tinuous parameter Markov chain interpolation and mh(-) a relaxed con-
trol representation of {u~,n < oo} for '1/Jh(·). Let {fh} be a sequence of
Fth-stopping times. Then {'1/Jh(·), mh(·), wh(·), Nh(·), h} is tight. Let the
limit of a weakly convergent subsequence be denoted by (x(·),m(·),w(·),
N(·), f), and let Ft denote the a-algebra induced by {x(s), m(s), w(s),
N(s),s:::; t,fl{T:::;t}}· Then w(·) and N(·) are a standard Ft- Wiener pro-
cess and Poisson measure, respectively, f is an Ft-stopping time and m(·)
is an admissible control. Let the jump times and jump magnitudes of N(·)
be denoted by {vn, Pn}· We also have

x(t) = x + 1t L b(x(s),a)ms(da)ds + 1t a(x(s))dw + J(t), (4.10)

where

J(t) = n:Et q(x(vn-),pn) = 1t 1r q(x(s-),p)N(dsdp). (4.11)

Proof. The direct method of Theorem 1.1 will be used. The sequences
{mh(·), h} are always tight since their range spaces are compact. LetT<
oo, and let iih be an Ff-stopping time which is no bigger than T. Then
for 8 > 0,
E~t lwh(iih + 8) - wh(iihW = 0(8) + eh,
where eh ~ 0 uniformly in iih. Thus, by Theorem 9.2.1, the sequence
{wh(·)} is tight. A similar argument yields the tightness of {Mh(·)}. The
sequence {Nh(·)} is tight (Theorem 9.2.1) because the mean number of
jumps on any bounded interval [t, t + s] is bounded by .Xs + 8?(s), where
8?(s) goes to zero ash--+ 0, and

lim inf P{v~+l- ~~~ > 8ldata up to v~} = 1.


8--tO h,n
This also implies the tightness of {Jh(·)}. Finally, these results and the
boundedness of b(.) implies the tightness of {'1/Jh (.)}.
For 8 > 0 and any process y(·), define the process Y<>(·) by Y<>(t) =
y(n8), t E [n8, n8 + 8). Then, by the tightness of {'1/Jh(·)}, (4.7) can be
written as

,ph(t) = x + ~t L b('ljJh(s), a)mZ(da)ds + Jh(t)


(4.12)
+ 1 a('ljJg(s))dwh(s) + E~' 8 (t),
10.5 Convergence of the Costs 291

where lim6-+0 limsuph-+O Ele~' 6 (t)1 = 0. We next characterize w(·) and


N(·). A slight variation of the proof of Theorem 1.1 will be used. Using
the notation of that theorem, we have

EH('Ij;h(ti), wh(ti), (¢i, mh)t., Nh(ti, rj),j $ q, i $ p, rhi{,.h:::;t})


x [wh(t + u)- wh(t)] = 0.
(4.13)
Abusing notation, let h index a weakly convergent sequence with limit
denoted by (x(·),m(·),w(·),N(·),r). Then, taking weak limits in (4.13)
yields

EH(x(ti), w(ti), (¢i, m)t;~ N(ti, rj),j $ q, i $ p, TJ{i':9})


(4.14)
x [w(t + u)- w(t)] = 0.

Because w(·) is continuous, as in Theorem 1.1, (4.14) implies that w(·) is


a continuous Ft-martingale. An analogous proof yields that

EH(x(ti),w(ti),(¢j,m)t;~N(ti,rj),j $ q,i $p,TJ{i':::;t})


(4.15)
x [w(t + u)w'(t + u)- w(t)w'(t)- ul] = 0.
Thus, the quadratic variation of the martingale w(·) is just tl, hence it is
an Ft- Wiener process. The proof that N (·) is a :Ft- Poisson measure also
follows the line of argument of Theorem 1.1 and the details are omitted.
Taking limits in (4.12) ash-+ 0 yields

x(t) = x + 1t L b(x6(s), a)m 8 (da)ds + 1t a(x6(s))dw(s) + J(t) + e6(t),

where

and where lim6-+0 Ele6(t)1 = 0. Finally, taking limits in this equation as


8-+ 0 yields (4.10). •

10.5 Convergence of the Costs: Discounted Costs


and Absorbing Boundary
We next treat the convergence of the costs Wh(x, mh) given by (4.4), where
mh(-) is a sequence of admissible relaxed controls for 1/Jh(·). It will also be
proved that
Vh(x) -+ V(x). (5.1)
By the results of Theorem 4.1, with Th = Th, we know that each sequence
292 10. Convergence Proofs

of the type used in Theorem 4.1 has a weakly convergent subsequence whose
limit processes satisfy {4.10). Abusing notation, let the given sequence con-
verge weakly and denote the limit by (x(·), m(·), w(·), N(·), f). Let f3 > 0.
Then, by the weak convergence Theorem 4.1, it is always the case that

and
E';'h e-f3rhg( 1/Jh(Th)) --t E';'e-f3'i' g(x( f)).
It is not always the case that the limit f = r, the first time of escape
of x( ·) from G 0 , analogous to the situation in Section 10.2 and Chapter
9. All the considerations discussed in these sections concerning the conti-
nuity of the exit times also hold here. Using Theorem 4.1 and following
the procedure in Section 10.2 for dealing with the continuity of the first
exit time, we have the following theorem which is one half of the desired
result {5.1). The last assertion of the Theorem 5.1 follows from the weak
convergence, Fatou's lemma and the fact that (using the Skorokhod rep-
resentation of Chapter 9 for a weakly convergent subsequence so that the
convergence is with probability one) liminfh 'Th ~ T. A criterion for the
uniform integrability is given in Theorem 5.2.

Theorem 5.1. Assume {A1.1)- {Al.3) and (A2.1). Let


{1/Jh(. ), mh(-), wh(-), Nh(. ), 'Th}
converge weakly to (x(·),m(·),w(·),N(·),f). Let the limit process x(·) sat-
isfy (A2.2) or else use the randomized stopping rule. Then f = T. If f3 > 0,
or if {'Th} is uniformly integrable, then
Wh(x, mh) --t W(x, m) ~ V(x), (5.2)
where the cost W(x, m) is for the limit process. Also,

liminfVh(x)
h
>
-
V(x). (5.3)

Let f3 ~ 0, k(·) ~ 0 and g(·) =0. Then


limhinfWh(x,mh) ~ W(x,m)

and (5.3) holds.

The Convergence Theorem. Let f3 > 0. In view of {5.3), in order to get


the convergence (5.1), we need to prove the reverse inequality
limsuplfli(x) ·::; V(x). (5.4)
h
10.5 Convergence of the Costs 293

The main idea in the proof is to use the minimality of the cost function
Vh(x) for the Markov chain control problem. Given an "almost optimal"
control for the x( ·) process, we adapt it for use on the chain, and then use
the minimality of Vh(x) and the weak convergence to get (5.4). Note that
r' in (5.6} below is the infimum of the escape times from the closed set
G. It is larger than or equal to T, the escape time from the interior ao.
Condition (5.6) holds if there is some i such that for each x E G,

(5.5}

In fact, the proof shows that {5.6} implies the uniform integrability of {rh}·

Theorem 5.2. Assume {Al.1}- {A1.4} and {A2.1}. Let /3 > 0, and assume
{A2.2). Then (5.4) and hence {5.1} hold. If instead {A2.2'} holds or if the
randomized stopping rule is used, then {5.1} continues to hold.
Let /3 = 0. Then, the assertion continues to hold if {7h} is uniformly
integrable. Definer'= inf{t: x(t) ¢ G}. Assume that there is T1 < oo and
Ot > 0 such that
{5.6}

Then under the other conditions of the /3 > 0 case the conclusions continue
to hold.

Proof. The proof of the first part of the theorem is given only under
{A2.2}, since the proofs under (A2.2') and the randomized stopping rule
are similar. Let /3 > 0 and let f and 0 be as in Theorem 3.1. As noted above,
we only need to prove {5.4). Let (m€(·), w€(·), N€(·)) be an admissible triple
for (1.1), where m•(-) is a relaxed control representation of an ordinary
control which is determined by the conditional distribution on the right
side of (3.11). Let x•(·) denote the associated solution to (1.1). By Theorem
3.1 and {A2.2), we can suppose that (x€(·), m€(·), w€(·), N€(·)) is E-optimal
and that
P::· {f{·) not continuous at x€(·)} = 0.
Let {e~,n < oo} and 1/Jh(·) denote the controlled Markov chain and
continuous parameter interpolation, respectively, for the control law to be
defined below. Similarly, let wh(-) and Nh(·) be defined by (4.5) and {4.9),
respectively, for this chain, and let Th denote the first escape time from
G0 • The (wh(·),Nh(·)) will replace the (w(·),N(·)) in the arguments of
the FnO of (3.11). Because the interpolated process 1/Jh(·) changes values
at random times which might not include the times {no} at which the
control changes in the discrete time approximation which led to {3.11), we
need to alter slightly the timing of the changes of the control values. Let
{r/:, k < oo} denote the jump times of 1/Jh (·) and define
a~ = min{ r/: :r/: ;: : no},
294 10. Convergence Proofs

the first jump time of '1/Jh(·) after or at time no. For each n, we have

a~+ 1 -a~ ~ 8 in probability. (5.7)

We will choose {u~, n < oo} such that uh (·) will be constant on the inter-
vals [a~,a~+l), with values determined by the conditional probability law
(3.11). In particular, fork such that T~ E [n8,n8 + 8), use the control law
u~ = uh(a~) which is determined by the following conditional distribution
at time a~

= Fn (a;x,uh(af),i < n, wh(pO),Nh(p(J,q),j ~ q,pO <a~).


(5.8)
Then, by Theorems 4.1 and 5.1, the assumptions concerning the €-optimal-
ity, and with probability one continuity properties of f( ·) with respect to
the measure of x€(·) with x€(0) = x, we have

and

where 83(€) ~ 0, as f ~ 0. This yields the first assertion of the theorem.


Now let j3 = 0 and assume (5.6). The main difference between the two
cases j3 > 0 and j3 = 0 concerns the finiteness of the costs and of the
"effective stopping time". We will prove that for each integer k

limsup sup E'(;'(rh)k < oo. (5.11)


h xEG,m

We show first that (5.6) implies that there are T2 < oo and 84 > 0 such
that for small h
(5.12)

Suppose that (5.12) does not hold. Then there are sequences Yh E G, Th ~
oo, and admissible mh(-) and associated processes '1/Jh(·) with initial condi-
tions Yh E G such that

(5.13)

Extract a weakly convergent subsequence (indexed also by h for notational


convenience) of {'1/Jh(·), mh(·), Th} with limit (x(·), m(·), r). Then Yh ~Yo =
x(O) E G. By the weak convergence and (5.6),

lim inf
h
P;:.h {Th ~ 2T1} 2': P;:; {r' ~ T1} 2': 81
10.5 Convergence of the Costs 295

which contradicts (5.13). Thus, (5.12) holds.


Now, let mh(-) be an arbitrary admissible control for the interpolated
chain 1j}(·), and let Th denote the associated escape time from G0 • Let .'Fth
denote the cr-algebra generated by the values of t/Jh(-) and mhO up to
timet. Now, by (5.12), for small h we can write

which implies that E'{;'h Th :::; T2+T2j(84j2) and indeed that for each integer
k, E'{;'(rh)k is bounded uniformly in h,m and x E G. •

Convergence in the Absence of Local Consistency at Some Points


and Discontinuous Dynamical and Cost Terms. In the discussion of
variable grids in Section 5.5, we encountered a problem with local consis-
tency on the set Ao. Owing to the way that the approximating chain was
constructed on A0 , the absence of local consistency there causes no prob-
lem with the convergence. This is a special case of the next theorem, whose
proof follows essentially from those of Theorems 4.1 and 5.2.

Theorem 5.3. Assume the conditions of Theorem 4.1 with the following
exceptions. (i) There are sets (;h c G and compact G such that (;h .).. G
and there is local consistency of the approximating chain except possibly on
{Jh n G~. (ii) There are bounded {b~, ii~} such that for x E {Jh,

E;;::~~~ = b~~th(x, a)+ o(~th(x,a)),

cov~;::~~~ = a~~th(x, a)+ o(~th(x, a)).


Let h index a weakly convergent subsequence. Then the conclusions of
Theorem 4.1 hold, except that the limit take the form

x(t) =X+ 1t b(s)ds + 1t a(s)dw(s) + J(t), (5.14)

where for x(t) ¢ G, we have

b(t) = L b(x(t),a)mt(da), a(t) = a(t)a'(t) = a(x(t)).

Let B(y), A(y) denote the sets of possible values of b(t), a(t), respectively,
when x(t) = y E G. Suppose that the solution to (5.14) does not depend on
296 10. Convergence Proofs

the choices of the "tilde" functions in G within the values allowed by the
sets B(y), A(y). Then the limit does not depend on the choices made in the
sets Gh.
If the conditions of Theorem 5.2 hold, but with the above exceptions, then
the conclusions of that theorem continue to hold.

Remark. The theorem also elaborates the remarks concerning discontin-


uous dynamics and cost terms which were made at the end of Subsection
10.1.1. The theorem can be extended to cover the cases of the next two
chapters. The theorem holds for the problem of Section 5.5, where G = A0 ,
and the set A(x) consists of the diagonal matrices whose second diagonal
entry is unity, and the first is bounded by 1/6.

10.6 The Optimal Stopping Problem


We now discuss the optimal stopping problem. The continuous time form
was introduced in Section 3.2, and the Markov chain forms in Sections 2.2
and 5.8.
We use the uncontrolled process model

x(t) = x +lot b(x(s))ds +lot a(x(s))dw(s) +lot 1r q(x(s-),p)N(dsdp),


(6.1)
where w ( ·) and N ( ·) are our standard Wiener process and Poisson measure,
respectively, with respect to some filtration :Ft. Let p be an :Ft -stopping
time. Define the cost

W(x,p) =Ex foP k(x(s))ds + Exg(x(p)). (6.2)

A discount factor could be added with no additional difficulty. Also, a


continuously acting control can be added, and then we would require a
combination of the details of this section and Sections 10.4 and 10.5, but
we will still have the convergence Vh(x) -t V(x), under the conditions
required in Section 10.5 [and if (3 = 0, the positivity of the cost rate k(·)].
We are also given a compact set C such that the process must stop by
the time r = inf{t : x(t) ~ C 0 } if it has not been stopped earlier. We
wish to select the stopping time p ::; r which minimizes the cost. Define
V(x) = infp~r W(x, p).
The following assumption will be needed:

A6.1. The solution to (6.1) is unique in the weak sense for each initial
condition x E C0 in that if pis an :Ft-stopping time and x(·) is a solu-
tion to (6.1), then the probability law of (w(·), N(·), p) determines the law
of (x(·),w(·),N(·),p). Also, either f(·) is continuous with probability one
10.6 Optimal Stopping 297

under the measure of x( ·) for each initial condition x of interest, or else


the randomized stopping role is used.

Remarks on (A6.1). The uniqueness condition holds under a uniform


Lipschitz condition on b( ·), o{), and q( ·,a) or if the process (6.1) is obtained
via a Girsanov measure transformation from a "base" process satisfying
(A6.1).

The next theorem gives a condition which guarantees that we need only
consider stopping times whose moments are uniformly bounded.

Theorem 6.1. Assume {A2.1} and {Al.1}- {Al.3} without the control,
and let inf:ceG k(x) = ko > 0. Assume (A6.1). Then there exists an optimal
stopping time p and
E:cP;::; 2 max lg(y)i/ko. (6.3}
yEG

Comment on the Proof. By the existence of an optimal stopping time


we mean that there exists a probability space with a filtration :Ft, an
:Ft- Wiener process and Poisson measure with an associated solution to
(6.1}, and an :Ft-stopping time p, such that W(x,p) ;::; W(x,p}, where
the cost on the right is for any other solution and stopping time on any
probability space. The theorem can be proved by a compactness and weak
convergence argument as in Theorems 1.1 and 2.1, and we omit the details.
The bound {6.3} is a consequence of the fact that if we stop at time p
instead of at time 0, we must have

g(x) ~ koE:cP + E:cg(x(p)).

The Optimal Stopping Problem for the Approximating Markov


Chain. The cost function for the approximating discrete parameter chain
and a stopping time N h is
Nh-1
wh(x, Nh) = E:c L k(e~)~t~ + E:cg(e~h).
n=O

For the continuous parameter chain ,ph ( ·) and a stopping time Ph, the
analogous cost is

Let Vh(x) denote the optimal value function. Then the dynamic pro-
gramming equation for both the discrete and continuous parameter chain
298 10. Convergence Proofs

problem is

Vh(x) ~ min [g(x ), ~ p"(x, y)Vh(y) + k(x )t.th(x)] , x E G~,


with the boundary conditions Vh(x) = g(x) for x ~ G~.

The Convergence Theorem. We have the following result:

Theorem 6.2. Under the conditions of Theorem 6.1, or if supxEG ExT'<


oo replaces the strict positivity of k(·), where T1 = inf{t: x(t) ~ G}, we
have Vh(x)-+ V(x).

Proof. We work with the first set of conditions only. The proof uses an
approximation procedure as in Theorems 4.1, 5.1, and 5.2. Let ('lj;h(·),ph)
denote the continuous parameter approximating chain and its optimal stop-
ping time, respectively, and define wh(-) and Nh(·) as in (4.5) and (4.9),
respectively. The sequence
('1/Jh( ·), wh(· ), Nh(·), Ph)
is tight and we can assume that the Ph satisfy the bound in (6.3) for all h
and x E G0 . By use of the Markov property, as at the end of Theorem 5.2,
it can be shown that this boundedness implies that lim suph Ex(Ph)k < oo
for any positive integer k. Thus, the sequence of stopping times is uniformly
integrable. Let (x(·),w(·),N(·),p) denote the limit of a weakly convergent
subsequence. Then, analogously to the situation in Theorem 4.1, (6.1) holds
for the limit processes and there is a filtration Ft such that w( ·) is an
Ft- Wiener process, N ( ·) is an Ft- Poisson measure, p is an Ft -stopping
time and x( ·) is adapted to Ft. By the uniform integrability and the weak
convergence,
Wh(x, Ph)= Vh(x)-+ W(x, p) 2: V(x).
To get the reverse inequality, we proceed as in Theorem 5.2 and use a
"nice" €-optimal stopping rule for (6.1) and apply it to the chain. Then
a weak convergence argument and the fact that Vh(x) is optimal for the
chain yields the desired reverse inequality. Let f > 0. First note that there
are 0 > 0 and T < oo such that we can restrict the stopping times for (6.1)
to take only the values {no, no :s; T} and increase the cost (6.2) by at most
f. Let p, be an optimal stopping time for (6.1), (6.2) with this restriction.
Proceeding as in Section 10.5, we can assume that this €-optimal stopping
time is defined by functions Fn(·) which are continuous in thew-variables
for each value of the other variables and such that the probability law of
p, is determined by P{p, = 0} and, for n > 1,
P {p, = nolx, w(s), N(s), s :S: no, p, >no- 0}
= Fn (x,w(p(}),N(p(},q),j :S: q,pB <no),
10.6 Optimal Stopping 299

where the partitions of r are as in Theorem 3.1.


As in Section 10.5, the comparison stopping times for the approximating
chain are defined via these functions. Let Ph be the stopping time [for ,ph (·)]
which is analogous to pf. That is, define a~ as above (5.8) and let the
probability law of Ph [which will take values {a~, n < oo}] be determined
by P{ph = 0} = P{pf = 0}, and, for n > 1,

P {Ph= a~lx, '1/Jh(s), wh(s), Nh(s), s ~a~, Ph> a~_ 1 }

= Fn (x,wh(p(J),Nh(p(J,q),j ~ q,p(} <a~).

As in Theorem 5.2, the proof is completed by a weak convergence argu-


ment and use of the uniqueness of the solution to (6.1). •
11
Convergence for Reflecting
Boundaries, Singular Control and
Ergodic Cost Problems

The development of the convergence proofs of Chapter 10 is continued, but


applied to the problem classes of Chapters 7 and 8. The reflecting bound-
ary and discounted cost problem is covered in Section 11.1. The primary
mathematical difficulty with which we must contend is the proof of tight-
ness of the "reflecting process." The problem is avoided by use of a time
rescaling method, under which all the processes are tight. After proving the
weak convergence of the rescaled processes and characterizing the limits,
the rescaling is inverted to obtain the desired results. This "inversion" is
possible due to the conditions imposed on the allowable reflection direc-
tions. The time rescaling idea appears to be a rather powerful tool.
In Section 11.2, the rescaling method is used for the singular control
problem. We treat the singular control problem with a reflecting boundary,
but an absorbing boundary could be used as well. The "ergodic" or average
cost per unit time problem is covered in Section 11.3 for the "ordinary"
control case. It is also noted that the weak limit of any weakly convergent
sequence of stationary measures for the 1/Jh(-) processes [recall the J.Lh(-)
of (7.5.7)] is a stationary measure for the limit process x(·). Hence the
methods can be used or the computation of invariant measures.
302 11. Convergence Proofs Continued

11.1 The Reflecting Boundary Problem


11.1.1 The system model and Markov chain approximation
Recall the system model with reflecting boundary from Sections 1.4 and
5.7. Rewrite (5.7.1) (with a jump term added) in relaxed control notation
as follows

x(t) = x + 1t L b(x(s), o:)m 8 (do:)ds + 1t a(x(s))dw(s)

+ 1t [
(1.1)
q(x(s-),p)N(dsdp) + z(t),
where m( ·) is an admissible control and the reflection term satisfies

izi(t) = variation of z(·) on [0, t] = 1t Iaa(x(s))dlzi(s),

z(t) = 1 t 0

f'(s)dizi(s), f'(s) E r(x(s)),


(1.2)

and z( ·) is continuous. The function r( ·) was defined in Sections 1.4 and


5.7.
In order to avoid trivialities in defining the "reflection" or "projection"
properties if a jump of N 0 takes x( ·) out of G, as in Chapter 1 we simply
assume that x + q(x, p) E G for x E G and any pEr.
Throughout this section, the conditions (A10.1.1)-(A10.1.4) will be as-
sumed to hold, where applicable, as will be the assumptions on the set G
and on the reflection directions and approximating chain given in Section
5. 7. It is also assumed that for each initial condition and fo > 0, there is an
~: 0 -optimal process (x( ·), z( ·), m( ·), w( ·), N ( ·)) which is weak sense unique.
The conditions on G and the reflection directions and an argument similar
to that in Theorem 1.8 can be used to show that z(·) is continuous.
The cost function of interest is (5.8.18) which we rewrite in relaxed con-
trol notation as

W(x, m) = E;" 1L 00
e-i3s [k(x(s), a)m 8 (do:)ds + c'(x(s))dz(s)], (1.3a)

where we suppose that c'(x)l' 2 0 for all')' E r(x) and all x. If G is a convex
polyhedron where z(·) has the representation (5.7.3):

z(t) = L riYi(t)
i

then, for ci 2 0, use

W(x, m) = E;" 1L 00
e-i3s [k(x(s), o:)m 8 (do:)ds + c'dY(s)]. (1.3b)
11.1 The Reflecting Boundary Problem 303

Remark on Discontinuous Dynamical and Cost Terms. The com-


ments in Chapter 10 concerning extension to discontinuous b(·) and k(·)
also hold for the problems of this and the next section, and similarly for
discontinuous c( ·). There will be an analogous result for the ergodic cost
problem.

An Auxiliary Result. The following theorem on the integrability prop-


erties of the variation of the reflection process will be useful later when
uniform integrability of the variation of the increments of z( ·) and zh ( ·) is
needed.

Theorem 1.1. Under the conditions of this section,


lim sup E:lzi 2 (T) = 0. (1.4)
T--+0 m,xEG

Also, for each T < oo


sup E:lzi 2 (T) < oo. (1.4')
m,xEG

For the polyhedrol G case, where the representation (5.7.3) holds, (1.4) and
(1.4') hold withY(·) replacing z(·).

Proof. Inequality (1.4') follows from (1.4) and the last assertion follows
from the conditions on the boundary and (1.4) and (1.4'). We start by
using the upper bound on the variation given in [42, Theorem 3.4], with
a different notation. For any set D and o > 0, define the neighborhood
N6(D) = {x : infyED Jx- yJ < o}. By the proof of the cited theorem,
our conditions on the boundary 8G and reflection directions in conditions
(i)-(v) of Section 5.7 imply the following: There are o > 0, L < oo, open
sets Di, i ~ L, and vectors Vi, i ~ L, such that UiDi :J G, and if x E
N6(Di) n 8G, then v~r > ofor all r E r(x).
The jump and drift terms are unimportant in the proof and we drop
them henceforth. Then, write the simplified (1.1) as

x(t) = x + R(t) + z(t), R(t) =lot cr(x(s))dw(s). (1.5)

Fix T < oo. Define a sequence of stopping times f3n and indices in recur-
sively as follows: Set f3o = 0. Let io be such that R(O) = x(O) E Dio· Set
{31 = inf{t: x(t) ¢ N6(Di 0 )}. Define i1 such that x(f31) E Di 1 • Continue in
the same way to define all f3n and in. By the definition of the f3i,
lx(f3i)- x(f3i-l)l ~ oj2.
By the proof of [42, Theorem 3.4], the conditions in the first paragraph
imply that
V~m-l(x(f3m)- x(f3m-d)- V~m-l(R(f3m)- R(f3m-1))
(1.6)
~ o(lzl(f3m) -lzl(f3m-d).
304 11. Convergence Proofs Continued

Define NT= min{n: f3n ~ T}- 1. Then, using the fact that the x(t) are
uniformly bounded, there is <5 1 < oo such that
NT
lzi(T) $ 81NT- L v:m-1 [R(f3m 1\ T}- R(f3m-ll\ T}] + <5tlx(T)- xi,
n=l
(1.7}
where x = x(O}. Then, using the boundedness of x(t}, there are 82 < oo
and a nonanticipative and bounded process a(·) such that

E~lzi 2 (T) $<52E~Ni+82E~ 1


T
a(s)dw(s)
2

+82~lx(T)-x1 2 . (1.8}

We need only estimate NT and the right hand term in (1.8}. There is
<5a < oo such that
P;n{sup IR(s)l ~ 8/4} ::; 16E~IR(T}I 2 /8 2 ::; 8aT. (1.9a}
s~T

An argument analogous to the last part of the proof of Theorem 1.8 below
implies that there is f 1 {-} such that

supP;n{lzi(T) ~ 8/4} $ ft(T}, (1.9b}


m,z

where ft(T) goes to zero as T ---+ 0. Via (1.9a,b) and the boundedness of
G, we have
(1.9c)
m,z
where f2(T) goes to zero as T---+ 0. Now given f small and positive, letT
be small enough such that ft (T) + <5aT $ f. Then

supP;n{sup lx(s)- xl 2 ~ <5/2} $f.


m,z s~T

This implies that


m,z
By a recursion using the Markov property, we get

supP;n{NT = k} $ fk, k > 0,


m,z

which yields that supE~(NT)k ---+ 0 as T---+ 0 for any k < oo. This last
m,z
fact, together with (1.9c} and (1.8} and yields (1.4). •

The Approximating Markov Chain. Let {e~, n < oo} be a Markov


chain which is locally consistent with the reflecting jump diffusion (1.1}-
(1.2} in the sense of Section 5.7, and let uh = {u~, n < oo} be an admissible
11.1 The Reflecting Boundary Problem 305

control sequence. Define the processes wh(·) as in (10.4.5). Recall the repre-
sentation (5.7.5) of the continuous parameter Markov process ,ph(·) which
is appropriate for the problem with a reflecting boundary, and which we
now rewrite in relaxed control notation

,ph(t) = x +lot L b(,Ph(s), a)mZ(da)ds + Mh(t) + Jh(t) (1.10)


+ zh(t) + zh(t) + d~(t),
where zh(·) is defined in Section 5.7, and

Mh(t) =lot a(,Ph(s))dwh(s)

and Eldf{t)l ---+ 0, uniformly in t in any bounded set. The cost functions
which we use for the Markov chain interpolation are those in (5.8.19), and
we rewrite them here as

L e-.Bt~ [k(~~. u~)~t: + c'(~~)~z~],


00

Wh(x, uh) = E;h (1.11a)


n=O

and for the polyhedral C where (5.7.3) holds

L e-.Bt~ [k(~~. u~)~t~ + c'~ynhJ.


00

Wh(x, uh) = E;h (1.11b)


n=O

Because the instantaneous reflection states X E act do not appear in


,ph(·), the cost functions {1.11) can only be approximated by a cost function
for '1/Jh(·), if there is a boundary cost. Let us rewrite {1.11a) in the following
relaxed control form, where mh(-) is the relaxed control representation of
uh(·)

E;h 1 00
e-.Bs [k('l/;h(s ), uh(s) )ds + c' (,Ph(s) )dzh(s) J + eh

E';:h 1L 00
e-.Bs [k(,Ph(s),a)mh(dads)

+c'('l/Jh(s))dzh(s)] + eh,
(1.12)
with the analogous form for {1.11b).
The error term eh is due to the approximation of the states on act by
the previous state inCh. By (5.7.4), it satisfies

f.h :<:;loco e-.Bse3(s)dEizhi(s), {1.12')

where e~(s)---+ 0, uniformly ins ash---+ 0. As discussed in Chapter 4, {1.11)


equals (modulo eh) (1.12) if the discount factor e-.8.1-th(z,a) in (1.11) is
approximated by 1/(1+{3~th(x,a)). Otherwise the two are asymptotically
equivalent.
306 11. Convergence Proofs Continued

11.1.2 Weak convergence of the approximating processes

A Problem with Weak Convergence. The main problem with carrying


over the type of weak convergence arguments used in Sections 10.4 to 10.6
is due to the fact that without additional information, we cannot show that
the reflection terms {zh ( ·)} are tight in the Skorokhod topology. In special
cases, such as in the analysis of the sort of reflected processes which arise
in "heavy traffic" analysis, there are theorems which allow us to represent
these reflection terms as continuous functions of the other processes on
the right side of (1.10), all of which can be shown to be tight. Perhaps the
basic such "continuity theorem" is the so-called reflection mapping theorem
of [69, 129] and its extension to sets C which are "boxes" in [118]. Such
methods were used in [94, 107, 118] for applications to control problems
and have been in common use in the study of the so-called heavy traffic
limits [129]. These continuity maps are very convenient where they can
be applied; e.g., to the heavy traffic problems. For more general reflecting
boundaries, the main results are those of [41, 115]. The continuity maps in
[115] were used in the numerical approximations in [94], but do not allow
us as much freedom as we would like for the approximation of both the
processes on the boundary and the boundary itself. See [100, Chapter 3]
for a discussion of the general problem of continuity of the reflection process
as a function of the other processes.
Here we will take a somewhat more general approach, which will also set
the stage for the treatment of the singular control problem. At first, we will
not quite get weak convergence of the approximating processes '1/Jh(·), but
we will be able to get something very close to it, and, more importantly,
we will still be able to prove that Vh(x) ---+ V(x). The method to be used
involves a (random) rescaling of time such that the processes are tight.
We then take limits of the rescaled processes and invert the rescaling to
get the desired results. The method seems to have been first used in [118]
for a treatment of the routing control problem in heavy traffic and for
the numerical problem for singular stochastic control in [106]. This latter
reference used the continuous parameter interpolation eh(·) of Section 4.2
and not the Markov chain interpolation '1/Jh(·) as used here, but that does
not change the results. A slight extension of the proofs for the time rescaling
method will be used in Theorem 1. 7 to get actual weak convergence.

A Rescaling of Time: The "Stretched Out" Processes. Recall the


definition (Section 4.3) of Llr~ and T~, the interjump and jump times for
act
'1/Jh(·); also, T~ = T~+l if e~ E (the reflection states are instantaneous
in the interpolation). Recall that a step n of the chain {e~' n < 00} is called
a reflection step if e~ Eact, the numerical approximation of the reflecting
11.1 The Reflecting Boundary Problem 307

boundary. Define L).f~ by

L).f~ = { t:.r~ for a nonreflection step,


it:.z~l for a reflection step.
(1.13)

Define
n-1

t~ = L: t:.tih.
i=O
Define the stretched out time scale Th(·) as follows: Th(O) = 0, the deriva-
tive of Th(·) is unity on [f~,f~+l] if step n is not a reflection step (i.e.,
~~ E Gh), and is zero otherwise. Thus, in particular,
Th(f~) = r~. (1.14)
The new time scale is illustrated in Figure 11.1, where the ~~ and ~i are
in act. Now define the rescaled or "stretched out" processes (denoted as
the "hat" processes) ,j;h(·),uh(·), etc., by
,j;h(t) = 1/Jh('fh(t)), uh(t) = uh('i'h(t)), m(da, t) = m(da, 'i'h(t)),
etc. The time scale is stretched out at the reflection steps by an amount
equal to the absolute value of the conditional mean value of the increment
t:.~~. namely by lt:.z~l· For the case in the figure, izhi(t) = 0 until rf, then
it jumps by lt:.z~l + lt:.zil, then it is constant until the next reflection step.

h
5

Figure 11.1. The stretched out time scale.

With the au.,ve definitions and (1.10), we have

,j;h(t) = x + 1t fu b(,j;h(s), a)mh(dads) + Mh(t)


{1.15)
+ Jh(t) + .zh(t) + .t(t) + J~(t).
308 11. Convergence Proofs Continued

Note that
h "h
m (dads)= mi'h(s)(da)dT (s).
A

Thus, it equals zero on the intervals corresponding to reflection steps (i.e.,


where fh(·) is flat), and takes its usual values otherwise. The process Mh(·)
is a martingale with quadratic variation

1t a(~}(s))dTh(s) + t5~(Th(t))
(where Esup 8 ~Th(t) lt5~(s)l-+ 0 ash--+ 0), and (5.7.6) becomes

Esup lih(s)i 2 = O(h)Eizhi(t). (1.16)


s~t

Comment. The rescaling stretches the processes in (1.10) so that they


are smoother. In fact, the piecewise linear path which connects the points
izhi(f:-) is Lipschitz continuous with coefficient unity. Also, fh(·) is Lip-
schitz continuous with coefficient unity. Thus, any weak limit of these pro-
cesses will have Lipschitz continuous paths with Lipschitz constant unity.
The processes zh ( ·) for which we might have had considerable difficulty
in proving tightness are now stretched out enough so that they are quite
''tame." We will do the weak convergence analysis for the rescaled processes.
Then, via an inverse time transformation of the limit processes, we obtain
the desired results of the type Wh(x, mh) --+ W(x, m) and Vh(x) --+ V(x).
The weak convergence of zh ( ·) itself is not required to get convergence of
the cost functions of interest.

Weak Convergence of the Approximating Processes.

Theorem 1.2. Assume the conditions of Subsection 11.1.1 and let {u~, n <
oo} be an admissible control sequence for the approximating Markov chain.
Then the sets of processes

q,h(·) ={,Ph(-), .,nh(·), wh(-), fh(·), fvh(-), Jh(·), .zh(·)},

h h h h ·h "h
Q (·) = {m (·),w (·),N (·)}, {z (·),81 (·)}
are tight. The 81 (-) and z (·) converge weakly to the zero process. Let h
"h ·h

index a weakly convergent subsequence of { q,h ( ·), Qh ( ·)} with limit

q,O = {x(·),m(·),w(·),T(·),N(·),l(·),z(·)}, Q(·) = {m(·),w(·),N(·)}.

The pair (w(·), N(·)) are a standard Wiener process and Poisson mea-
sure, respectively, with respect to the natuml filtmtion, and m( ·) is ad-
missible. Also, i:(t) E G. Let Ft denote the a-algebm which is genemted
11.1 The Reflecting Boundary Problem 309

by {tiT(s}, s :::; t}. Then w(T(t)) = w(t) and is an it-martingale with


quadmtic variation T(t)I. Also, N(t) = N(T(t)) and

J(t) = kt q(x(s-},p)N(dsdp).
The limit processes satisfy

x(t) = x+ kt L b(x(s},a)m(dads)+ kt u(x(s))dw(s)+l(t)+z(t). (1.17}

The process z(-) can change only at those t for which x(t) E 8G. It is
differentiable and the derivative satisfies

! z(t) E r(x(t)).

Proof. Tightness of {~h(·)} is proved as tightness was proved in Theorem


10.4.1, and similarly for the limiting properties of 8f(·). Let h index a
weakly convergent subsequence. Equation (1.16} and the boundedness of
{Eizhi(t)} imply the zero limit of 2\). Theorem 10.4.1 implies that w(·)
and NO are as asserted, as well as the admissibility of m(·). Because
T(·) and w(·) are continuous, we have w(t) = w(T(t)). The proofs of the
assertions concerning N(t) = N(T(t)) and the representation of 1(·) are
straightforward and are omitted. Because e~ E G U aat, it follows that
x(t) E G.
For any function ¢( ·), define the piecewise constant approximating func-
tion ¢6(·) as in Theorem 10.4.1. Then [analogously to (10.4.12)], we can
write

{j;h(t) = x + kt L b({/;h(s),a)mh(dads) + kt u({f;;(s))dwh(s)


{1.18)
+ }h(t) + .zh(t) + ih(t) + 8f(t) + ih•6(t),
where lim6--+0 limsuph--+O Ejih• 6(t)i = 0, and the convergence is uniform in
t on any bounded interval. Taking limits in (1.18) yields

x(t) =X+ 1t L b(x(s},a)m(dads) + 1t u(x6(s))dw(s)


(1.19)
+ l(t) + z(t) + i 6(t),
where lim6--+0 Eii~(t)i = 0. If w(·) can be shown to be an it-martingale,
then we could take limits in {1.19), as was done in Theorem 10.4.1, and get
the representation ( 1.1 7).
310 11. Convergence Proofs Continued

For the "hatted" processes, redefine (¢, m)t to be

(¢,m)t =lot ¢(a,s)m(dads).

The desired martingale property can be obtained by proceeding as in The-


orem 10.4.1 and showing that

EH(x(ti),w(ti),(¢j,m)t;~N(ti,fJ),z(ti),j ~ q,i ~P)


{1.20)
X [w(t + u)- w(t)j = 0,

where the rj are as used in Theorem 10.4.1.


By the upper semicontinuity property of r( ·) (the condition (v) in Section
5.7.3) and the fact that ~z~ is in a direction in r(x) if e~ =X E aGt, we
have that (d/dt)z(t) E r(x(t)). •

Remark. Note for future reference that


EH(x(ti), w(ti), (¢j, m)tp N(ti, rj), z(ti),i ~ q, i ~ p)
x[w(t + u)w'(t + u)- w(t)w'(t)- (T(t + u)- T(t))I] = o.
(1.21)

Theorem 1.3. Under the conditions of Subsection 11.1.1

lim lim sup sup E~hlzhi 2 (T) = 0. (1.22)


T-+0 h-+0 mh,x

Also, for any T < oo

limsup sup E~hlzhi 2 (T) < oo. (1.23)


h-+0 mh,x

Remark on the Proof. The proof parallels that of Theorem 1.1. It uses
the validity of (1.6) for 1/Jh(-) and zh(·) (for small h) and the fact that for
any f > 0 and sequence {mh(·)} of admissible controls,

lim limsup sup P::"h {izhi(T) 2:: t:} = 0.


T-+0 h mh,x

The latter fact can be proved by an argument for the "stretched out" or
"hatted" processes of the type used in the last part of Theorem 1. 7 below.

Theorem 1.4. Assume the conditions of Subsection 11.1.1, and let(~(·),


Q( ·)) denote the limits of a weakly convergent subsequence as in Theorem
1.2. Define the inverse

T(t) = inf{s: T(s) > t}.


11.1 The Reflecting Boundary Problem 311

Then, T(·) is right continuous and T(t) --1- oo as t --1- oo, with probability
one. For any process ¢(·), define the "inverse" ¢(t) = J>(T(t)), and let
:Ft denote the minimum a-algebra which measures {\lt(s),s ~ t}. Then
w(·) and N(·) are a standard :Ft- Wiener process and Poisson measure,
respectively. Also, m(·) is admissible with respect to (w(·),N(·)) and (1.1)
and (1.2) hold.

Proof. Inequality (1.23) implies that T(t) --1- oo with probability one as
t --1- oo. Thus, T(t) exists for all t and T(t) --1- oo as t --1- oo with probability
one. By (1.20) and (1.21) we also have
EH(x(ti), w(ti), (¢i, m)t;, N(ti, q), z(ti),j :-::; q, i :-::; p)
x [w(t + u)- w(t)] = 0,
EH(x(ti), w(ti}, (¢i, m)tp N(ti, q), z(ti),j ~ q, i ~ p)
x [w(t + u)w'(t + u)- w(t)w'(t) - ul] = 0.
Thus, w( ·) is an :Ft- Wiener process. We omit the details concerning the
fact that N ( ·) is an :Ft- Poisson measure. It follows that m( ·) is admissible
with respect to (w( ·), N ( ·)), using the filtration :Ft. Finally, a rescaling in
(1.17) yields that (1.1) and (1.2) hold. •

The Limits of the Cost Functions. The next theorem shows that the
costs Vh(x) and Wh(x, mh) converge to the costs for the limit processes
and that
liminfVh(x)
h
>
-
V(x). (1.24)

Given (1.24), in order to complete the proof of convergence


Vh(x) --1- V(x), (1.25)
we will use a method (in Theorem 1. 7) which is similar to that in Theorem
10.5.2.

Theorem 1.5. Assume the conditions of convergent subsequence of {~h(·),


Qh(·)} with limits as denoted in Theorem 1.2. Then, for (l.lla)

W(x,mh) --1- Er;' 1oo Le-.BT(t)[k(x(t),a)m(dadt) +c'(x(t))dz(t)]


= Er;' 1L
00
e-.Bt[k(x(t),a)m(dadt) +c'(x(t))dz(t)] = W(x,m),
(1.26)
wit an analogous development for {l.llb). Furthermore, {1.24) holds.

Proof. We concentrate on {l.lla). By Theorem 1.3, we have uniform in-


tegrability of
(1.27a)
312 11. Convergence Proofs Continued

or of
(1.27b)
according to the case of interest. The integrability properties of (1.27) imply
that fh(s) ---too ass---too, with probability one. Thus, the cost (1.12) can
be written as

Wh(x,mh) = E';h fooo L e-.Bi'h(s)[k(.,j;h(s),a)mh(dads)

+ c'(.,j;h(s- ))dzh(s)] + €h.


By the uniform integrability of (1.27), €h ---t 0.
Note that by the definition of fh(·) and the integrability properties of
(1.27)

E';h roo e-,ai'h(s)dlzhl(s)::; E';h 1 00


A e-.Bsdlzhl(s) ---t 0,
jT min{t:Th(t)~T}
uniformly in h as T ---t oo. Also, due to the tightness and the uniform
integrability properties of {1.27), for any T

loT e-.ai'h(s)c'(.,j;h(s-))dzh(s)
can be arbitrarily well approximated (uniformly in h) by a finite Reimann
sum for which the number of terms does not depend on h. These facts, the
uniform integrability properties of (1.27), and the weak convergence, imply
that Wh(x, mh) converges to

W(x,m) = E; koo L e-.Bi'(s) [k(x(s),a)m(dads) + d(x(s))dz(s)].


By an inverse transformation, this equals

W(x, m) = E'; fooo L e-f3s [k(x(s), a)m(dads) + d(x(s))dz(s)].

To show (1.24), let the mh(-) above be a minimizing sequence. Then


extract a weakly convergent subsequence, also indexed by h, and with limits
denoted as above to get Wh(x, mh) ---t W(x, m) ~ V(x). I

In order to complete the development, we need the following analogues


of the assumptions used in Chapter 10.

Al.l. For each € > 0 and initial condition of interest, there is an €-optimal
solution (x(·),z(·),m(·),w(·),N(·)) to {1.1), {1.2) which is unique in the
weak sense. That is, the distribution of (m(·),w(·),N(·)) implies that of
(x(·),z(·),m(·), w(·),N(·)).
11.1 The Reflecting Boundary Problem 313

A1.2. Let u(·) be an admissible ordinary control with respect to (w(·), N(·)),
and suppose that u( ·) is piecewise constant and takes only a finite number
of values. Then, for each initial condition there exists a weak sense solution
to (1.1) and (1.2), where m(·) is the relaxed control representation ofu(·),
and this solution is unique in the weak sense.

A1.3. For the case where G is a convex polyhedron and the representation
z(t) = Li riYi(t) is used only: Either (a) or (b) holds. (a) The covariance
a(x) is nondegenerate for each x. (b) Let ci > 0 so that there is a positive
cost associated with boundary face 8Gi. Then at each edge or corner which
involves aai, the set of reflection directions on the adjoining faces are
linearly independent.

A Comparison Control. The proof of the following theorem, which will


be used in Theorem 1.7, is nearly the same as that of Theorem 10.3.1.

Theorem 1.6. Assume the other conditions of this section, (A1.1)-(A1.3)


and either cost function in (1.3). Fix Eo> 0, and let (x(·), z(·), m(·), w(·),
N(·)) be an Eo-optimal solution whose existence is asserted in (A1.1). Then,
for each e > 0, there is a d > 0 and a probability space on which are defined a
pair (w~(-), NE(·)), a control uE(-) of the type introduced in Theorem 10.1.2,
and a solution (xE( ·), zE(·)) to (1.1) such that

IW(x, mE) - W(x, m)l ~e.

There is(}> 0 and a partition {q,j ~ q} ofr such that the approximating
uE(-) can be chosen so that its probability law at any time nd, conditioned on
{wE(s),NE(s),s ~ nd,u~(i8),i < n}, depends only on the initial condition
x = x(O) and on the samples

{wE(p(}),NE(p(},q),j ~ q,p(} < n8; uE(i8),i < n},

and is continuous in the x, w~(p(}) aryuments for each value of the other
aryuments.

Theorem 1.7. Under the other conditions of this section and (A1.1)-
(A1.3), (1.25) holds.

Proof. The proof is similar to that of Theorem 10.5.2. Use the assumed
uniqueness conditions and the uniform integrability of

{lzl(n + 1) -lzl(n); n, m,x(O)} (1.28)

(which follows from Theorem 1.1) to get a comparison control of the type
used in Theorem 10.5.2. Then use the weak convergence results of Theorems
1.2, 1.4 and 1.5. •
314 11. Convergence Proofs Continued

Tightness of {wh(·)} and Continuity ofT(·). Theorem 1.2 actually


implies that the original processes {wh(·)} is tight. This will now be proved
in detail because it will be needed in Section 11.3, and at the same time the
argument used to get (1.9b) will be elaborated on. Theorem 1.8 also implies
the continuity of z(·), as well as the existence of a weak sense solution to
(1.1) and (1.2).

Theorem 1.8. Assume the conditions of Subsection 11.1.1. Let h index


a weakly convergent sequence of {~h(·)} and define T(·) as in Theorem
1.4. Then T(·) exists for all t. It is continuous with probability one and
wh(·) =} w(·).

Proof. Define Th(t) = inf{s : fh(s) > t}. Suppose that T(t) exists for
each t and T( ·) is continuous, with probability one. Then {Th (·)} must be
tight and the weak limit must equal T( ·) with probability one. Then, since

the weak convergence and the continuity ofT(·) yield

Thus, we need only prove the existence ofT(t) for each t and the continuity
ofT(·).
For the rest of the proof, we drop the jump term J(·), because we can
always work "between" the jumps. Suppose that the inverse T(t) does not
exist for all t. Then, loosely speaking, there must be a sequence of intervals
whose length does not go to zero and such that fh(·) "flattens" out on
them. More particularly, there are p0 > 0, Eo > 0, t 0 > 0, and a sequence of
random variables {vh} such that for all E > 0

Extract a weakly convergent subsequence of

with limit
(x(-), m(-), w(·), .. .).
Then on a set of probability greater than p0 , we have

x(t) E 8G on [0, to], T(to) = T(O) = 0, lzl(to) > 0.


The fact that lzi(t 0 ) > 0 as asserted above is implied by the weak conver-
gence and the boundary conditions (i)-(v) of Section 5.7.3.
11.2 The Singular Control Problem 315

Thus, on this set dx(t) = dz(t) on [0, to] and x(t) E aG. This violates the
conditions on the reflection directions. In particular, the conditions on the
boundary and reflection directions (i)-(v) of Section 5.7.3 imply that x(·)
cannot remain on the boundary on any time interval on which z( ·) is not
constant. The possible values of z( s) on that interval would force x( ·) off
the edge. Thus, T(·) exists for all t < oo with probability one. The same
proof can be applied to yield the continuity ofT(·). •

11.2 The Singular Control Problem


The system model is (8.1.22') but we add a jump term J(·), and it is
assumed that there is a weak sense unique weak sense solution for each ini-
tial condition for the uncontrolled problem. The associated cost is (8.1.23'),
where ki;;::: 0, qi > 0, and we use the assumptions (AlO.l.l) to (A10.1.3),
where applicable. The spectral radius of the matrix P is less than unity.
Also, it is supposed that for each Eo > 0, there is an Eo-optimal solu-
tion which is weak sense unique. Throughout this section, suppose that
the chain{~~' n < oo} is locally consistent with (8.1.22') in the sense of
Section 8.3, and the set G is the "box" used in that section. The model
can be generalized in many directions via use of the techniques of Chapter
10 or of the last section. For example, one can add a continuously acting
control term, and also treat the optimal stopping problem. The conditions
on the boundary can also be weakened as in the last section, provided that
the conditions of Section 5.7 continue to hold. We prefer to deal with the
simpler case because that is the easiest way to expose the basic structure
of the method.
Recall that (see below (8.1.22')) F(·) has the form

where the vi are given vectors and the pi ( ·) are real valued processes. The
interpolation '1/Jh (·) can be represented as

'1/Jh(t) = x +lot b('!f;h(s))ds + Mh(t) + Jh(t) + Fh(t) (2.1a)


+(I- P')Yh(t) - Uh(t) + zh(t) + frh(t) + <>?(t),
where

the terms Mh(t), Jh(t), zh(t), <>?(t) are as in Section 5.7, and frh(t) satisfies
(8.3.4). The cost function for the approximating chain is taken to be (8.3.5)
316 11. Convergence Proofs Continued

and it can be rewritten as, for ki 2:: 0, Qi > 0,

w•(x, F•) ~ E;" [.-fit [k(,p•(t))dt +~ q;dF"·'(t) +~ k,du•·'(t)]·


(2.1b}
Many of the results and methods of Section 11.1 can be carried over
without much change. By a slight variation of Theorem 1.1, we have the
following result.

Theorem 2.1. Assume the conditions of this section and let EIF(T)I 2 < oo
for each T < oo. Then

sup E;"(IY(TW + IU(T)I 2 ) < oo


xEG

for each T < oo.


The analogue for the approximating chain can be proved along the same
lines as used for Theorem 1.1 and is as follows.

Theorem 2.2. Under the assumptions of this section and the condition
that
limsupsupEIFh(n + 1}- Fh(n)l 2 < oo, (2.2}
h n
we have

limsupsupEt (IYh(n + 1}- Yh(n)l 2 + IUh(n + 1}- Uh(n)l 2 ) < oo.


h n

In Section 11.1, it was not a priori obvious that the reflection terms
{zh(·)} were tight. We dealt with that problem by use of a stretched out
time scale. But in Theorem 1.8, we showed that the {wh(·)} actually was
tight. The situation is more complicated here, because the control sequence
{Fh(·)} can always to be chosen such that neither it nor the associated
sequence of solutions is tight. The time rescaling method of Section 11.1
still works well. In order to use it, we will have to redefine the rescaling
to account for the singular control terms. Recall the trichotomy used in
Chapter 8, where each step of the chain {e~, n < oo} is either a control step,
a reflection step, or a step where the transition function for the uncontrolled
and unreflected case is used (which we call a "diffusion" step, even if there
is a "Poisson jump"). Redefine fl.f~ by

fl.r~ for a diffusion step,


fl.f~ = { IAY:I + IAU~I for a reflection step, (2.3)
lfl.F~ I for a control step.
11.2 The Singular Control Problem 317

Recall the definition of'fh(·) which was used above (1.14). Redefine 'fh(·)
such that its slope is unity on the interval [f~, f~+l] only if n is a diffusion
step [that is, if~~ E Gh and no control is exercised], and the slope is
zero otherwise. Now, redefine the processes ,j;h( ·),etc., with this new scale,
analogously to what was done in Section 11.1. In place of (1.15) we have

,j;h(t) = x +lot b(,f;h(s))dTh(s) + £/h(t) + Jh(t) + frh(t)


(2.4)
+(I- P')Yh(t)- {Jh(t) + ih(t) + Fh (t) + &~(t).
In the present context, Theorems 1.2 and 1.4 can be rewritten as follows.

Theorem 2.3. Assume the conditions of this section. Then the sets of
processes

h h
Q (·)={w (·),N
h
0}, { 'h : :. h
c5 1 (·),Y
: :. h
(·),U (·),F
::..h
0}
are tight. The third set converges to the zero process. Let h index a weakly
convergent subsequence of {~h(·),Qh(·)} with limit

~(·) = (x(·),w(·),T(·),N(·),l\),U(-),frO), Q(·) = (w(·),N(·)).

Thew(·) and N(·) are the standard Wiener process and Poisson measure,
respectively, with respect to the natural filtration, and x(t) E G. Let Ft
denote the a-algebra which is generated by {~(s), s:::; t}. Then w(T(t)) =
w(t) and is an Ft-martingale with quadratic variation J~ a(x(s))dT(s).
Also,
}(t) =lot q(x(s-), p)N(dsdp).

and

x(t) =X+ lot b(x(s))dT(s) +lot a(x(s))dw(s) + F(t) (2.5)


+ }(t) +(I- P')Y(t) - U(t).

The process f'i(-), [respectively, {Ji(·)] can change only at those t for which
xi(t) = 0 [respectively, xi(t) = Bi]. lf(2.2) holds, then Theorem 1.4 contin-
ues to hold and the rescaled processes satisfy (8.1.22') with the jump term
added.

The limit of the costs can be dealt with via the following theorem.
318 11. Convergence Proofs Continued

Theorem 2.4. Assume the conditions of this section and (2.2), and let
h index a weakly convergent subsequence of {~h(·),Qh(·)}. Then, with the
other notation of Theorem 2.3 used,

Wh(x, ph) -t E% 1oo e-Pi'Ct> [k(x(t))dT(t) + q'dF(t) + k'dU(t)]


= E% 1 00
e-f3t [k(x(t))dT(t) + q'dF(t) + k'dU(t)]
W(x,F).
(2.6a)
Now, drop the assumption (2.2). Then,

(2.6b)

Completion of the Proof that Vh(z) -+ V(z). In order to complete


the proof, we need to find a comparison control as in Theorem 1.6 or 10.5.2.
The following conditions will be needed.

A2.1. For each f > 0 and initial condition of interest, there is an f - optimal
solution (x(·),F(·),w(·),N(·)) to (8.1.22') with the jump term added, and
it is unique in the weak sense.

A2.2. Let F(·) be an admissible control with respect to (w(·),N(·)), and


suppose that F( ·) is piecewise constant and takes only a finite number of
values. Then for each initial condition there exists a weak sense solution to
(8.1.22') with the jump term added, and this solution is unique in the weak
sense.

To construct the comparison control, we will use the following result,


which can be proved by an approximation and weak convergence argument.
See [106] for the details of a closely related case. Let pa denote the a-th
component of the control vector F.

Theorem 2.5. Assume the conditions of this section and (A2.1}, (A2.2).
Let f > 0. There is an t:-optimal admissible solution (x(·), z(·), F(·), w(·),
N(·)) to (8.1.22') (with the jump term added) with the following properties.
(i) There are Te < oo, 8 > 0, 0 > 0, km < oo, and p > 0, such that F( ·) is
constant on the intervals [n8, n8 +8), only one of the components can jump
at a time, and the jumps take values in the discrete set kp, k = 0, ... , km.
Also, F(·) is bounded and is constant after time T •. (ii) The values are
determined by the conditional probability law (the expression defines the
11.2 The Singular Control Problem 319

functions Qnka (·))

P{dPa(no) = kp,dPb(no) = O,b =f aix,P(io),i < n, w(s),N(s),s::; no}


= Qnka (kp;x,F(io),i < n, w(pO),N(pO,rj),j::; q,pO <no),
(2.7)
where the Qnka (·) are continuous with probability one in the (x, w) variables,
for each value of the other variables.

The final convergence theorem can now be stated.

Theorem 2.6. Under the conditions of this section, (A2.1) and (A2.2),
Vh(x) -t V(x).

Proof. We need to adapt the €-optimal control of Theorem 2.5 for the
approximating Markov chain. In preparation for the argument, let us first
note the following. Suppose that we are given a control of "impulsive mag-
nitude" vidpi acting at a time t 0 • Let the other components Fi(·),j =f i, of
the control be zero. Thus, the associated instantaneous change of state
is vidFi. We wish to adapt this control for use on the approximating
chain. To do this, first define nh = min {k : r/: ~ to}. Then, starting at
step nh of the approximating chain, we approximate vidpi by applying
a succession of admissible control steps in conditional mean direction vi,
each of the randomized type described in Section 8.3. In more detail, let
E~ ae~ = Vi6.P~·i' k ~ nh, denote the sequence of "conditional means," as
in Section 8.3. Continue until Lk 6.P~,i sums to dPi (possibly modulo a
term which goes to zero ash -t 0). There might be some reflection steps
intervening if G is ever exited. Let FC(-) denote the continuous parameter
interpolation of the control process just defined. Because the interpolation
interval at a control or reflection step is zero, all of the "conditional mean"
jumps vi6.P~,i occur simultaneously in the interpolation, and the sequence
{PC(·)} is tight. Also, fl'h(·) converges weakly to the zero process. Thus,
the weak limit is just the piecewise constant control process with a single
jump, which is at time to and has the value vidpi.
With the above example in mind, we are now ready to define the adapted
form of the €-optimal control P(·) given in Theorem 2.5. Let ph(-) denote
the continuous parameter interpolation of this adaptation. ph (·) will be
defined so that it is piecewise constant and has the same number of jumps
that P(·) has (at most Te/0). Each of the jumps of each component of the
control is to be realized for the chain in the manner described in the above
paragraph. The limit of the associated sequence (,Ph(·), zh(·), ph(·), wh(·),
Nh(·)) will have the distribution of the (x(·), z(·), P( ·), w(·), N(·)) of The-
orem 2.5. The non zero jumps for the approximating chain are to occur as
soon after the interpolated times no as possible, analogous to the situation
in Theorem 10.5.2.
320 11. Convergence Proofs Continued

Next, determine the jumps dF~,i of the control values for the adaptation
of F( ·) to the approximating chain by the conditional probability law:
P{ dF::,a = kp, dF::,b = 0, b # alx, Fh(a?), i < n, 1/Jh(s), wh(s), Nh(s),
Uh(s),Lh(s),s :S n<5}

= Qnka (kp;x,Fh(af),i < n, wh(p(J),Nh(p8,rj),j :S q,p(J < n8).


(2.8)
Let vadFa,h(n8) denote the jump magnitude chosen by the law (2.8)
for the chain at time n8. Now, we must "realize" this value. Following the
method of Theorem 10.5.2, define v~ = min{k : Tr
> n8}. Now, starting
at step v~, realize vadFa,h(n8) by the method of the first paragraph of
the proof. Then, continuing to follow the method of Theorem 10.5.2, apply
Theorems 2.3 and 2.4 to get that
Vh(x) :S Wh(x,Fh)---+ W(x,F) :S V(x) + E,
which yields the theorem. •

11.3 The Ergodic Cost Problem


The system model is the reflected jump diffusion (1.1) and, analogously to
(1.3a) and (1.3b), respectively, the cost function takes either the form

'Y(x, m) = lim,:up ~E;' loT [L k(x(t), o:)mt(do:)dt + c'(x(t))dz(t)] ,


(3.1a)
where c( ·) is continuous, or for the case where G is a convex polyhedron,

')'(x, m) = lim,:up ~E;' lT [L k(x(t), o:)mt(do:)dt + ~ cidYi(t)].


(3.1b)
The limit might depend on x. Let 1 denote the minimal cost over all nonan-
ticipative controls, and suppose that it does not depend on the initial con-
dition. Throughout this section, assume (A10.1.1) to (A10.1.4), (A1.3) and
the assumptions on the set G and reflection directions of Section 5. 7. The
chain {~~, n < oo} is assumed to be locally consistent with the reflected
jump diffusion in the sense of Section 5.7. We use the notation of Section
11.1 and continue to suppose that x + q(x, p) E G for x E G and pEr.
In Chapter 7, the dynamic programming equations were defined under
specific conditions on the approximating chains. Those conditions are more
than are needed for the convergence theorems, and we need only assume
here that the various chains which appear below are stationary. The con-
vergence results are essentially consequences of those of Section 11.1. Let
us assume the following condition.
11.3 The Ergodic Cost Problem 321

A3.1. For each small h there is an optimal feedback (x dependent only)


control uh (·) for the chain {~~, n < oo}.

Condition {A3.1) holds if there is a solution to the Bellman equation


(7.5.10) and for each feedback control the state space consists of a single
communicating class and a transient set. Some conditions which guarantee
this are given in Chapter 7. There will be at least one invariant measure
under the control uh(·). Choose any one and, in Theorem 3.1 below, let
{ ~~, n < oo} and 1/Jh (·) be the stationary processes which are associated
with the chosen invariant measure and let mh(-) be the relaxed control
representation of uh('I/Jh(·)). We write z(·) = (Y(·), U(·)).

Theorem 3.1. Assume the conditions of this section and {A3.1). Then
Theorems 1.1 to 1.4 and 1.8 hold. Let h index a weakly convergent subse-
quence of {wh(·)} with limit denoted by 'll(·). Then 'll(·) satisfies {1.1) and
{1.2) and the distribution of

x(t), m(t + ·)- m(t), z(t + ·)- z(t)

does not depend on t.

Proof. In view of the cited theorems, only the stationarity needs to be


proved. By the stationarity of the chosen 1/Jh (·), the distribution of

does not depend on t. This and the weak convergence {Theorem 1.8) yield
the theorem. •

Convergence of the Costs. Let x be the initial condition of interest. If all


states communicate under u;h(·), then the invariant measure is unique and
the minimum cost -;:yh does not depend on x. Otherwise, the state space Sh =
Gh u act is divided into disjoint communicating sets and a set of transient
states. In this case, to fix ideas we suppose that x is not a transient state
for any h of interest, and use the stationary process associated with the
invariant measure for the communicating set in which x lies. Let {~~, n <
oo} and 1/Jh( ·) be the optimal stationary discrete and continuous parameter
chains, and let h index a weakly convergent subsequence. If x is a transient
state for a sequence of h -+ 0, then the appropriate stationary process 1/Jh (.)
can be constructed as a randomization among the stationary processes
associated with the various communicating classes, and essentially the same
proof can be used.
Using the weak convergence {Theorem 1.8) and the uniform integrabil-
ity of (1.27), for (1.3a) we can write, where mh(-) is the relaxed control
322 11. Convergence Proofs Continued

representation of the optimal control uh (·),

ih(x) !h(x,uh)
Euh f01 Uu k('lj;h(t), a)mf(da)dt + c'('lj;h(t))dzh(t)]
--+ Em f01Uu k(x(t),a)mt(da)dt + c'(x(t))dz(t)]
limr ~Em J: Uu k(x(t), a)mt(da)dt + c'(x(t))dz(t)]
1(m) 2: 1,
(3.2)
where 1(m) is the cost for the limit stationary process. The same develop-
ment holds for (1.3b).
As in the previous sections, to complete the proof that

(3.3)

we need to find a nice comparison control. This is not as easy to do for


the ergodic cost problem as it was for the discounted problem or for the
other problems where the cost was of interest for essentially a finite time.
The next result uses the following additional condition for either (1.3a) or
(1.3b). It will be weakened subsequently.

A3.2. For each E > 0, there is a continuous feedback control ue(-) which
is £-optimal with respect to all admissible controls, and under which the
solution to (1.1) and (1.2) is weak sense unique for each initial condition
and has a unique invariant measure.

Remark. The condition does not seem to be too restrictive. If a(x) is


nondegenerate for each x E G, the set (b(x,U), k(x,U)) is convex for each
x, and there is a unique weak sense solution to the uncontrolled problem
for each initial condition, then the method of [91] can be applied to the
problem with a reflecting boundary to get that there is always a smooth
£-optimal control. This is fully developed in [100, Chapter 4]. The continu-
ity of b(·), k(·) and c(·) can be weakened. Following the idea in Chapter 10,
suppose that b(x, a) takes either of the forms bo(x) +bt(a) or bo(x)bt(a) or
is a product or sum of such terms, where the control dependent terms are
continuous and the x-dependent terms are measurable. Let p,( ·) denote the
associated invariant measure and D the set of discontinuity. It is sufficient
that

The functions k ( ·), c( ·) can be treated in the same way.

Theorem 3.2. Assume the conditions of this section, (A3.1) and (A3.2).
Then (3.3) holds.
11.3 The Ergodic Cost Problem 323

Proof. We use the same conventions concerning the chains and the initial
state x as given above (3.2), except that the u£(-) of (A3.2) replaces u;h(·).
Let {e~, n < oo} and ,ph (·) denote the stationary chains under the control
u£(·). Then, the weak convergence (Theorem 1.8), the uniform integrability
of (1.27), and the stationarity of the limit, yield that, for (1.3a),

;yh(x) < -yh(x, u£)


= Eu' J; [k(,Ph(t), u£(1/Jh(t)))dt + d(,Ph(t))dzh(t)]
---t Eu' J01 [k(x(t), u£(x(t)))dt + d(x(t) )dz(t)] (3.4)

limr ~Eu' J{ [k(x(t), u£(x(t)))dt + d(x(t))dz(t)]


= -y(u£) $ 1 + f,

with the analogous result for (1.3b), which yields the theorem. •

A Weaker Condition. In applications, there is usually a smooth f-opti-


mal feedback control as needed in (A3.2). If this cannot be guaranteed, the
following weakening of (A3.2} can be used, under the other conditions of
this section. Let a(x) be nondegenerate for each x, and assume that G is a
convex polyhedron with constant direction of reflection on the relative inte-
rior of each boundary face. Suppose that the number of boundary faces that
can meet any edge or corner is no greater than the dimension r and that
b(·,O) and a(·) are Lipschitz continuous, where b(·,O) denotes the uncon-
trolled drift term. Then there is a control which is analogous to that derived
in Theorem 10.3.1 which can be used to show that liminfh -yh(x) $ ;y. The
development requires the introduction of "relaxed feedback controls" and
"occupation measures" and proceeds in a manner which is parallel to the
development of a similar result for the convergence of ergodic cost func-
tions for the heavy traffic approximations in [100, Section 4.8], to which
the reader is directed.

Stationary Measures. The convergence of the stationary processes ,ph (·)


to a stationary process x(·) implies that any weak limit of the stationary
measures J..Lh(·) of (7.5.7} is a stationary measure for x(·). This is true
for the case where either a fixed continuous feedback control (x dependent
only) u(.) is applied to {e~' n < 00} for all h, or for the optimally controlled
chain. Of course, in the latter case, the limit might not be Markov. If a fixed
continuous feedback control u( ·) is applied and the stationary measure J..L( ·)
of x(·) is unique under that control, then J..Lh(-) => J..L(·).
The continuity condition can be weakened. Let u( ·) be measurable with
Du denoting the set of discontinuities and J..L( ·) the associated invariant
measure. Then the convergence still holds if J..L(Du) = 0.
12
Finite Time Problems and Nonlinear
Filtering

The problems considered in Chapters 10 and 11 were of interest over an


unbounded time interval, and one did not need to keep track of the actual
value of the current time. The cost functions were all over an unbounded
interval, whether of the discounted form, of the form where control stops
when a set is first exited, or of the average cost per unit time type. Time
did not enter explicitly into the stopping criterion. All of the results and
problem formulations (except for the average cost per unit time formula-
tion) can be readily extended to problems of interest over a given bounded
interval only. Owing to the explicit use of a bounded time interval, there
are several variations of the locally consistent approximating Markov chains
which might be used. They can all be easily derived from those of Chapter
5. The approximating chains are loosely divided into the "explicit" and "im-
plicit" classes, depending on the treatment of time, somewhat analogous
to the classification in classical numerical analysis. Section 12.1 contains
an example to motivate the general form of the "explicit" method, and the
general case is dealt with in Section 12.2. A motivating example for the
"implicit" method appears in Section 12.3, and the general case is given
in Section 12.4. Various combinations of these methods can be used. An
optimal control problem is formulated in Section 12.5, and the numerical
questions as well as the convergence of the algorithms are dealt with in
Section 12.6. It turns out that the natural analogues of all of the models of
control problems of the previous chapters (with the above cited "ergodic"
exception) can be dealt with by the previous proofs with little change. But
only the essential points will be covered. The methods are developed as
direct extensions of the Markov chain approximation ideas of the previous
326 12. Finite Time and Filtering Problems

chapters.
Section 12.7 concerns the problem of numerical approximation for the
nonlinear filtering problem. It is shown how the previous Markov chain ap-
proximations can be used to get effective numerical approximations under
quite weak conditions. For simplicity of notation, the dynamical terms and
cost rates will not depend explicitly on time. The alterations required for
the time dependent case should be obvious. The method is equivalent to
using an optimal filter for an approximating process, which is the approxi-
mating Markov chain, but using the actual physical observations. Conver-
gence is proved. The general idea is quite versatile and has applications
to other problems in "approximately optimal" filtering [102, 98, 17, 18].
Each step of the method is effectively divided into two steps: updating
the conditional probability via the dynamics, and then incorporating the
observation.

12.1 The Explicit Approximation Method: An


Example
In Chapters 10 and 11, where Markov chain approximations and algorithms
of the type introduced in Chapters 5 and 6, respectively, were used, the time
interval of interest was essentially unbounded. It was either infinity or else
was the time required for the process to exit a given set. The terminal time
does not appear explicitly in the algorithms of Section 5.8 or in Chapters 7
and 8. In a sense, the convergence proofs in Chapters 10 and 11 are prob-
abilistic "extensions" of the convergence proofs in numerical analysis for
approximations to elliptic partial differential or partial differential integral
equations. In many cases, the time interval on which the control problem
is of interest is bounded by a finite number T, and then the basic numeri-
cal problem is analogous to the solution of a (perhaps nonlinear) parabolic
partial differential equation. In that case, unless the interpolation inter-
val fith(x, a) is independent of x and a, the Markov chain approximations
given in Chapter 5 and the algorithms of Chapter 6 cannot be used without
some modification.
In classical numerical analysis, the approximations which are in use for
elliptic and parabolic equations are closely related, and the common proce-
dures for the parabolic case can be easily obtained from analogous common
procedures for the elliptic case. For the parabolic problem, the time param-
eter needs to be accounted for explicitly. The appropriate approximations
are easily obtained from those of Chapter 5. When solving parabolic equa-
tions by finite difference methods, one can chose either an "explicit" or
an "implicit" finite difference approximation. With appropriately chosen
difference approximations, each can be viewed as a Markov chain approx-
imation method in much the same sense that the Markov chain approxi-
12.1 Explicit Approximations: An Example 327

mations of Sections 5.2 and 5.3 could be seen as interpretations of stan-


dard finite difference approximations. Loosely speaking, one can divide the
Markov chain approximations for the ''fixed" terminal time problem into
the classes "explicit" and "implicit". Both classes can be easily obtained
from any Markov chain approximation to the underlying diffusion or jump
diffusion process which is locally consistent in the sense of Chapter 5. As
in Chapter 5, finite difference methods are used only by way of example.
Let E'!:,t denote the expectation of functionals on [t, T] conditioned on
x(t) = x and the use of admissible control u(·). In order to facilitate our
understanding of how to get the locally consistent Markov chain approxi-
mations and algorithms which are appropriate for the finite time problem,
we first do a one dimensional example which is a finite time analogue of
Example 4 of Section 5.1. The example will motivate the general technique
for the explicit method.

Example. LetT < oo and foro > 0, let N 0 = Tjo be an integer. Let
the system x(·) be defined by (5.1.20), and let k(·) and g(·) be smooth and
bounded real valued functions. As in Section 5.1, T, k{·),and g(·) play only
an auxiliary role in the construction of the approximating chain. For t < T
and a feedback control u(·), define the cost function

W(x, t, u) ~ E;,, [J.T k(x(s), u(x(s), s))ds + g(x(T))]·

More general boundary conditions will be used in Section 12.5 below. By


Subsection 3.1.5, the function W(x, t, u) formally satisfies the partial dif-
ferential equation

Wt(X, t, u) + _cu(x,t)W(x, t, u) + k(x, u(x, t)) = 0, t < T, {1.1)

with boundary condition W(x, T, u) = g(x), where the differential operator


ca is defined by

2
/(x)
raf()=b(
J, X X,O:
)af(x)
OX + !2()8
20' X (Jx2 •

We next define a finite difference approximation to {1.1), with interval


h > 0. For concreteness, we follow {5.1.8) and use the "one sided" difference
approximation of the first derivative Wx. The two sided difference can also
{and should) be used if u 2(x)- hlb(x, u(x))l ~ 0 for all x, as below (5.1.17).
328 12. Finite Time and Filtering Problems

We use the "explicit" forms


f(x, t + 8)- f(x, t)
ft(x, t) ---t 8

fx(x, t) ---t f(x + h, t + 8h- f(x, t + 8) if b(x, u(x, t)) ;::: 0,


(1.2)
fx(x, t) ---t f(x, t + 8)- ~(x- h, t + 8) if b(x, u(x, t)) < 0,

f(x + h, t + 8) + f(x- h, t + 8) - 2f(x, t + 8)


fxx(x, t) ---t

Substituting (1.2) into (1.1), letting Wh· 8 (x, t, u) denote the solution to
the finite difference equation with x an integral multiple of h and n8 < T,
collecting terms, and multiplying all terms by 8 yields the expression

Wh· 8(x, n8, u) = Wh· 8(x, n8 + 8, u) [1- 2 :2 -lb(x, u(x, n8))1~]


a (x)

2
-h8
+ W h '0 (x + h,n8 + 8,u) [a- 2(x) + 8]
2 + b (x,u(x,n8))h

+ W ho' (x- h, n8 + 8, u) [a- 2(x)


2 8 _ 8]
- h2 + b (x, u(x, n8))h

+ k(x, u(x, n8))8,


(1.3)
with the boundary condition Wh· 8 (x, T) = g(x). The method is called the
explicit method, because the equation (1.3) can be solved recursively: The
values of Wh• 8 at time n8 can be obtained by a simple iteration from the
values at time n8+8. I.e., the time variable on the right side of (1.3) equals
n8 + 8 and it is n8 on the left side. This is due to the fact that the spatial
derivatives are approximated at time n8 + 8.
Note that the sum of the coefficients of the Wh,o terms in (1.3) is unity.
Suppose that the coefficient

[ 1-a
2 8
(x)h 2 -lb(x,u(x,n8))1h8]
of the Wh• 8 (x,n8 + 8,u) term is nonnegative. Then the coefficients can
be considered to be the transition function of a Markov chain. Defining
ph• 8 (x, yla) in the obvious way, we then rewrite (1.3) as

Wh· 8 (x, n8, u) = :~::>h· 8 (x, ylu(x, n8))Wh• 8 (y, n8 + 8, u) + k(x, u(x, n8))8.
y
(1.4)
12.1 Explicit Approximations: An Example 329

The ph•6(x, yia) are the transition probabilities for a controlled Markov
chain. Let the associated Markov chain be denoted by {e~· 0 ,n < oo}. Note
that
b(x, a)o,

covh,a !J.t:h,o
x,n ~n u 2 (x)o + O(M).
Let 0--+ 0 and h--+ 0 together. By analogy to the definition (4.1.3), we say
that the "explicit" controlled Markov chain {~~· 6 , n < oo} with interpola-
tion interval o> 0 is locally consistent with x( ·) if

b(x, a)o + o( o),


(1.5)
u 2 (x)o + o(o).
Thus, the constructed chain and interpolation interval are locally consistent
with (5.1.20).
Define the piecewise constant continuous parameter interpolation eh,o (·)
by eh· 6(t) = ~~· 6 on the interval [no, no+ o). (See Section 4.2.) With the
boundary condition Wh· 6 (x,T,u) = g(x) and t =no< T, the solution to
(1.4) can be written as

Wh· 6 (x,t,u) = E;,n [~\(e~· 6 ,u(~~· 6 ,io))o+g(e~:)]


E;,, [ 1T k(<•·• (s), e•·• (
u( s), s) )ds +g(<•·• (T))]·

This expression is an approximation to W(x, t, u) if the process ~h,o(-) is


an approximation to x(·).
One can also define an analogue 1/Jh,o (·) of the continuous parameter
Markov chain interpolation 1/Jh ( ·) of Section 4.3 and an appropriate cost
function. The details are left to the reader. The continuous parameter in-
terpolation is just an intermediary in getting the convergence of the cost
functions, and either interpolation can be used with the same results.

Remarks. The example shows that one can treat the fixed terminal time
problem quite similarly to the unbounded time problem. Notice the fol-
lowing important point. Let ph(x, yia) be the transition probabilities in
Section 5.1, Example 4. Then, for y -=f. x,

ph'0 (x,yia) =ph(x,yia) x normalization(x), (1.6)


where
normalization (x) = 1- ph• 6 (x,xla)
0 h
= h2Q (x,a),
330 12. Finite Time and Filtering Problems

where Qh(x,a) = u 2 (x) + hlb(x,a:)l, the normalization in (5.1.22).


A similar relationship exists in the multidimensional problem and will be
pursued and formalized in the next section. The equivalence exists because
the only difference between this example and Example 4 of Section 5.1 is
the presence of the Wt in (1.1), which causes a transition of each x to itself
owing to the form of the finite difference approximation of the first line of
(1.2). In general, as will be seen in the next section, we need not use a finite
difference based method to get the explicit Markov chain approximation.
They can be obtained directly from the approximations of Chapter 5.

12.2 The General Explicit Approximation Method


The above one dimensional example and special state space can be easily
generalized by using the observations made in Section 12.1 as a guide and
applying them to any of the calculated transition probabilities of Chapter 5
whether of the finite difference or of the so-called direct method types. The
method is referred to as "explicit," because of the analogy with the example
of Section 12.1. Time advances by <5 at each step. Suppose that we are given
transition probabilities ph(x, yla:) and an interpolation interval ~th(x, a)
which are locally consistent with the controlled diffusion (5.3.1) or jump
diffusion (5.6.1). Let the desired interpolation interval c5 > 0 be given. Let
ph· 6 (x, yla:) (to be defined below) denote the transition probabilities for the
approximating chain for the explicit method. Then, following the guide of
the example in Section 12.1, for x ::/= y we wish to have the "proportionality''

Ph,tS(x,yla) h
1- ph•tS(x, xla:) = p (x, yla:). (2.1)

That is, under the control parameter a, and given that e~· 6 = x, the prob-
ability that e:~l = y ::/= X equals ph(x, yla:). Let <5 --+ 0 and h --+ 0 to-
gether. Then, the requirement that both pairs (ph(x, yla:), ~th(x, a)) and
(ph· 6 (x, yla:), c5) be locally consistent implies that, modulo small error terms,

b(x, a:)c5 L(Y- x)ph• 6(x, yla:)


y
= L(y-x)ph(x,yla)(1-ph,t5(x,xla)) (2.2)
y
b(x, a)~th(x, a:)(l - ph•6 (x, xi a)).
Equation (2.2) implies that (modulo small error terms)

h6 <5
l-p' (x,xla:) = ~ h( ) E (0,1]. (2.3)
t x,a
12.3 Implicit Approximations: An Example 331

Thus, given (ph(x, yia), ~th(x, a)) and a 8 > 0 satisfying

8 ::; min ~th(x, a), (2.4)


x,o

we get the transition probabilities ph• 6 for the explicit method and the as-
sociated approximating Markov chain {{!·6 ,n < oo} from (2.1) and (2.3).

12.3 The Implicit Approximation Method: An


Example
The fundamental difference between the so-called explicit and implicit ap-
proaches to the Markov chain approximation lies in the fact that in the
former the time variable is treated differently than the state variables: It
is a true "time" variable, and its value increases by a constant 8 at each
step. In the implicit approach, the time variable is treated as just another
state variable. It is discretized in the same manner as are the other state
variables: The approximating Markov chain has a state space which is a
discretization of the (x, t)-space, and the component of the state of the
chain which comes from the original time variable does not necessarily in-
crease its value at each step. The basic approach for the implicit scheme
is best illustrated by a comparison of the explicit and implicit finite dif-
ference approximation to the solution of (1.1). Instead of (1.2), use the
approximations (again, using a one sided difference approximation for Wx
for specificity)

f(x, t + 5) - f(x, t)
ft(x,t) --+
5

fx(x,t) --+
f(x + h, t) - f(x, t)
if b(x, u(x, t)) ~ 0,
h
(3.1)
f(x, t) - f(x- h, t)
fx(x, t) --+ if b(x, u(x, t)) < 0,
h
f(x + h, t) + f(x- h, t) - 2f(x, t)
fxx(x, t) --+ h2

Note that the last three equations of (3.1) use the value ton the right side
rather than t +8 as in (1.2). Using (3.1) and repeating the procedure which
332 12. Finite Time and Filtering Problems

led to (1.4}, for no< T we get the standard finite difference approximation

[1 +a2(x) :2 + lb(x,u(x,no))l*] wh· (x,no,u) 6

=
2 o
[ -a 2(x) + ho o]
-h2 +b (x,u(x,no))h W • (x+h,no,u)

a (x) o
2 _ o] h o
+ [ -2- h2 + b (x, u(x, no))h w ' (x- h, no, u)

+ Wh· 6 (x,no + o,u} + k(x,u(x,no))o.


With the obvious definitions of ph,o and fl.£h,cS, we can rewrite the above
expression as
Wh• 6 (x, no, u) = LPh,o(x, no; y, noju(x, no))Wh,o(y, no, u)
y
+ ph• 6 (x,no;x,no + oiu(x,no))Wh,o(x,no + o,u)
+ k(x, u(x, no))Ll.ih,o(x, u(x, no)).
(3.2)
The ph,o are nonnegative and
LPh,cS(x, no; y, nola)+ ph,o(x, no; x, no+ oja) = 1.
y

It can be seen from this that we can consider the ph,o as a one step transition
probability of a Markov chain {(~· 6 , n < oo} on the "( x, t) -state space"
{O,±h,±2h, ... } X {O,o,28, ... }.
It is evident that time is being considered as just another state variable.
Refer to Figure 12.2b for an illustration of the transitions. Note that, anal-
ogously to {1.6), for x =F y we have
ph•6 (x,no;y,noja) =ph(x,yia) x normalization(x), (3.3)
where the ~(x, yia) are the transition probabilities of Example 4 of Section
4.1. Analogously to what was done in Section 12.2, this relationship will be
used in Section 12.4 to get a general implicit Markov chain approximation,
starting with any consistent (in the sense of Chapter 4) approximation.
Write (~· 6 = ((~· 6 ,(:;g),where the 0-th component (:;g
represents the
time variable, and (~· represents the original "spatial" state. Then we have
6

gJf.,h fl.rh,o = b(x a)Ll.£h,o(x a}


x,n ':,n ' ' '

cov~;~L'l.(~· 6 = a 2 (x)Ll.£h,o(x,a) + Ll.[h,o(x,a)O(h),


Eh,o. fl.l'h,o = Ll.[h,o(x a).
x,n ':tn,O '
12.4 General Implicit Approximations 333

Thus, the "spatial" component of the controlled chain is locally consistent


with {5.1.20). The conditional mean increment of the "time" component
of the state is Llih· 6(x, a:). We have constructed an approximating Markov
chain via an "implicit" method. It is called an implicit method because
(3.2) cannot be solved by a simple backward iteration. At each n, (3.2)
determines the values of the {Wh· 6 (X' no, u)} implicitly.

12.4 The General Implicit Approximation Method


With the illustration of the above section in mind, we now describe the gen-
eral method. The procedure follows the general idea used in Section 12.2.
As in Section 12.2, let ph(x, ylo:) and fl.th(x, a:) be a transition function and
interpolation interval, respectively, which are locally consistent with either
the controlled diffusion (5.3.1) or the controlled jump diffusion (5.6.1). Let
o > 0 be given. Let ph•6 and Llfh· 6 denote the (to be defined) transition
probability and interpolation interval, respectively, for the implicit approx-
imation. Analogously to (2.1), the example in Section 12.3 suggests that,
for x -:f y, we use the relationship

h ph• 6(x,no;y,nolo:)
P (x, Ylo:) = 1- p'
Ah 6( x,no;x,no +o I0: )" (4.1)

Thus, to get the transition probabilities ph• 6 , we need only get

ph•6(x, no; x, no+ nolo:).

This will be done via the local consistency requirements on both (ph(x, ylo:),
fl.th(x, a:)) and the corresponding quantities for the implicit approximation.
The conditional mean one step increment of the "time" component of (~· 6
is
Eh,a fl.1"h• 6 = PAh·6(x ' no·' X ' no+ olo:)o'
z,n ~n,O
{4.2)

and we define the interpolation interval fl.ih· 6 (x, a:) by (4.2). Of course, we
can always add a term of smaller order. The consistency equation for the
spatial component ((~· 6 ) of the chain is (modulo a negligible error term)

b(x, o:)fl.th(x, a:) = L)Y- x)ph(x, ylo:)


y
ph• 6 (x, no; y, nolo:)
= ~(y-x)1-ph,6(x,no;x,no+olo:) <4·3)
Llih•6(x, a:)
= b(x, a:) 1- ph,6(x, no; x, no+ olo:)"
By equating the two expressions (4.2) and (4.3) for the interpolation
334 12. Finite Time and Filtering Problems

interval Llih• 6(x,o:), we have

flih· 6(x, o:) = (1- ph•6(x, n8; x, n8 + 8lo:))flth(x, o:)


= ph•6(x, n8; x, n8 + 8lo:)8.
Thus,
flth(x, o:)
ph•6(x, n8; x, n8 + 8la:) =
flth(x, o:) + 8' (4.4)
8Llth(x, o:)
Llih•6(x, o:)
flth(x, o:) + 8 ·
A short calculation shows that this equals what we obtained in the example
of Section 12.3.
Note that Llfh·6(x, o:) goes to flth(x, o:) as 8/ flth(x, o:) goes to infinity. In
particular, suppose that flth(x, o:) = O(h 2 ) and 8 = O(h), as is common.
Then
Llih•6(x, o:) = Llth(x, a:)(1 + O(h)).
Generally, the implicit method provides a more accurate approximation
than the explicit method.

The Solution to {3.2): Vector Case. Because the interpolation times


are not necessarily constant, it is not necessarily possible to represent the
solution to (3.2) as a sum of functions of a nonrandom number of steps
of the approximating chain. But the solution can be represented in terms
of the path up to the first time that the time component (:•g reaches or
exceeds the value T. Define the stopping time '

Nh• 6 (T) = min{n: (~·g;::: T}.


'

Let u = { u~· 6 , n < oo} be an admissible control sequence for {(~· 6 , n < oo}.
Define flfh,6 6 ~n 6' uh·
n = Llih• (rh,
6
n )' and let Eu
x,n denote the expectation given
use of u and (~· 6 = x, (~·g = n8. Then the solution to (3.2) with the
boundary condition Wh· 6 (~, T, u) = g(x) can be written as

(4.5)

An equivalent representation to (4.5) is


12.5 Optimal Control Computations 335

Next, define the "interpolated" times i~· 6 = E~- 1 ti£~· 6 , and define the
continuous parameter interpolations (h· 6(-) = ((h· 6(.), (~· 6 (.)) and uh,6 (-)
by
uh•6(t) = u:• 6, (h• 6(t) = (~· 6 fortE [£!• 6 ,£:· 6 + 6£:• 6). (4.7)
Then, fort= no, we can write (4.5) and (4.6) as, respectively,

E,;,, [J.T k(("~(s), u"·'(s))ds + g(('·'(T))] ,


E,;,, [J.T k((.,.,(s-), u'·'(s- ))~~(s) + g(('·' (T))]·
An Alternative Approximating Chain. There is an alternative Markov
chain approximation {(~· 6 } which is useful when there is no control and
will be used for the "implicit" approximation to the optimal nonlinear filter
in Section 12.7 below. To get this new chain, we just look at (~· 6 at the
times that (::g
changes. In more detail, define vo = 0, and, for n > 0 define

Vn-
_ mm
· {'~ > lln-1 .· '>i,O rh,6 _ uJ>} ·
rh,6 - '>i-1,0-

Then define (~· 6 = (I:~6 • Define the continuous parameter interpolation


(h,6(t) = (~· 6 fortE [no, no+ 8).
Suppose that there is no control and denote the one step transition prob-
ability of {(~· 6 } by fP· 6 (x,y). This can be calculated as follows. For any
function g(·), define U(·) by

U(x) = LPh·6 (x, nO; y, no)U(y) + ph• 6 (x, no;x,nt5 + o)g(x). {4.8)
y

The solution can be written as


U(x) = LPh,6(x,y)g(y). (4.9)
y

Suitable adaptations of the methods of Chapter 6 can be used to get the


ph• 6 (x, y),
and solutions of equations such as {4.8).

12.5 The Optimal Control Problem:


Approximations and Dynamic Programming
Equations

The Cost Function. The system will be the controlled diffusion (5.3.1)
or the jump diffusion (5.6.1). All of the problem formulations of Chapters
336 12. Finite Time and Filtering Problems

10 and 11 can be carried over to the finite time case and the proofs require
no change, given the appropriate locally consistent transition probabili-
ties. However, for illustrative purposes, we next set up the computational
problem up for the case of an absorbing boundary. Assume:

A5.1. G is a compact set in JW and is the closure of its interior. k(·) is a


bounded continuous function on G x [0, T] xU, and g(·) is a bounded and
continuous function on ([.mr - G0 ] x [0, Tl) U (.m x {T}).

Define T = min{t: x(t) fl. G0 }. For an ordinary admissible control u(·),


define the cost function

W(x, t, u) = E~,t [iT liT k(x(s), s, u(s))ds + g(x(T 1\ r), T 1\ r)].


For an admissible relaxed control m(·), the cost is written as

W(x, t, m) = E:,t [iT liT L k(x(s), s,a)m(dads) + g(x(T 1\ r), T 1\ r)].


(5.1)
Define the optimal cost, where the infimum is over all admissible controls
of the indicated type

V(x, t) = infW(x, t, u) = inf W(x, t, m).


u m

The formal dynamic programming equation for the minimal cost is


Vt(x, t) +min [C-'V(x, t) + k(x, t, a)] = 0, (5.2)
aEU

for x E G0 and t < T, with the boundary conditions V(x, t) = g(x, t), x fl.
G0 ,ort;:::T.

The Computational Approximation: Explicit Method. Let 8 > 0


satisfy (2.4) and let r}· 6 (x, yia) be the transition function of a locally con-
sistent approximating Markov chain as derived in Section 12.2. LetT = N68
for some integer N6. Then the dynamic programming equation for the ap-
proximating optimal control problem is

yh,6 (x, n8) =min


aEU
[~::>h·6 (x, yia)Vh•6(y, n8 + 8) + k(x, n8, a)8] , (5.3)
y

for x E G~, n8 < T, and with the same boundary condition as for (5.2).

The Computational Approximation: Implicit Method. Let


ph• 6 (x, n8; y, n8la), ph'6 (x, n8;x, n8 + 8ia)
12.6 Solution Methods 337

be the transition function of an approximating Markov chain for the im-


plicit method as derived in Section 12.4, with the interpolation interval
~fh,.S(x, a:). Then the dynamic programming equation for the approximat-
ing optimal control problem is

Vh• 5 (x, n8) = ~Jll [LPh,tS(x, n8; y, n8la:)Vh,.S(y, n8)


y

+ ph,.S(x, n8; x, n8 + 8la:)Vh,.S(x, n8 + 8) + k(x, n8,a:)~fh,.S(x, a:)]


{5.4)
for x E ~and n8 < T, and with the same boundary condition as for {5.2).

Combinations of the Explicit and Implicit Method. The explicit


Markov chain approximation is a special case of the implicit ar,proximation,
because we can always add a ''time" component (called e::o)
to e~· 6 and
define the extended state €~· 6 = (e~· 6 ' e::g),
where ~e::g = 8 for all n.
The two methods can be combined. One can randomize among them.
For example, let Po E [0, 1] and at each step choose the explicit method
with probability Po and the implicit with probability {1- p0 ). Also, Po can
depend on the current value of the state. Whether there is any value to
such combinations is not clear.

12.6 Methods of Solution, Decomposition and


Convergence

Solving the Explicit Equation {5.3). Because the minimum cost func-
tion values Vh· 6(y, no+ 8) at time no+ o are on the right side of (5.3),
while the value at state x at time n8 is on the left side, (5.3) is solved by
a simple backward iteration.

Solving the Implicit Equation (5.4). In {5.4), if the time on the left is
n8 and the state there is x, then the only value of the minimum cost function
at time n8 + o which is on the right side is Vh· 6(x, n8 + 8). The Vh,6(y, n8)
on the right, for y =f. x, are evaluated at time n8. Hence, {5.4) cannot
be solved by a simple backward iteration. However, all of the methods of
Chapter 6 for solving (6.1.1) or {6.1.2) can be used. The general approach
is as follows: Let No = T. Starting with time N, we have the boundary
condition Vh· 5 (x, No)= g(x, N8). Suppose that Vh• 6(x, no+8) is available
338 12. Finite Time and Filtering Problems

for some n < N. Then there is a C(-) such that we can rewrite (5.4) as

Vh' 0 (x, n8) =min


o:EU
[:~:~:)h' 0 (x, n8; y, n8la)Vh,o(y, n8) + 6h(x, n + 1, a)]
y
(6.1)
for x E Gh and with the boundary condition Vh• 0 (x, n8) = g(x) for x ~ G~.
The 6h(x,n, ·)are continuous in a.
By the construction of the transition probabilities, ph,o (x, nc5; x, nc5 +
c5la) > 0 for all a, x. Hence,

I)h' 0 (x, nc5; y, nc51a) < 1,


y

for all a and x. This and the continuity of the transition probabilities
implies that the effective transition function in (6.1) is a contraction for all
feedback controls, and that the contraction is uniform in the control. Now
simply apply to (6.1) any of the methods of Chapter 6 which can be used
for (6.1.1) or (6.1.2).

Decomposition Methods. The parallel processing and decomposition


methods which can be used for solving (6.1.1) or (6.1.2) can also be used
for (5.4) or (6.1). One possible implementation will be illustrated on a one
dimensional problem. The state space is illustrated in Figure 12.1.

----~r-,-~--,-~------------------~a

------r-~-+--r-;-----------------~b

------L-~~L-~~--r---------------~C
t- T
Figure 12.1. The decomposition regions.

The spatial interval is [a, c] which is a multiple of h and is discretized by


grid points h units apart. Also suppose that (b-a) is an integral multiple of
h. The explicit method will be used for updating the values of the minimum
value function at b. The transitions are illustrated in Figure 12.2a, and are
assumed to be to the nearest neighbors only (for our illustrative example).
12.6 Solution Methods 339

We suppose that the transition probabilities at points x I= b are as for the


general implicit method of Section 12.4 and are illustrated in Figure 12.2b.

(a) The explicit method. (b) The implicit method.


Figure 12.2: x I= a, b, c.

Now the computational procedure is as follows: For given n, let the val-
ues Vh· 6(x, nt5 + t5) be given for all x. Calculate Vh• 6(b, nt5) in terms of
these given values via the explicit method. Then, at time nt5, the values of
Vh· 6 (x, nt5) are known, except for x E G~ n (a, b) and G~ n (b, c) and the
two halves can be worked on simultaneously via the implicit method.
The sequence Vh,.S (x, nt5) can be shown to converge to the correct value
V(x, t) ash -t 0, 8 -t 0, and n8 -t t. Clearly, more than one subdivision of
the state space can be used and the procedure can also be used in higher
dimensions. We have no actual computational experience with the method.
See also [96] for an analysis of decomposition algorithms from the point of
view of controlled Markov chains.

Convergence Theorems. Under their respective conditions, the finite


time analogues of all of the theorems in Chapters 10 and 11 hold. We need
only extend the definition of the functions which might depend on time, as
done in (A5.1) above.

Alternative Methods. In a series of interesting papers, Hanson [24, 25,


26] has explored the use of Galerkin-type procedures, splitting methods
and parallel computation for problems where the cost is quadratic in the
control and the dynamical term b(x,cr) is of the form bo(x)cr. The proofs
of convergence require more regularity than we need here, but the results
show considerable promise. At present, it is not clear when those classes of
methods have "Markov chain" interpretations.
340 12. Finite Time and Filtering Problems

12.7 Nonlinear Filtering


The nonlinear filtering problem is one of the basic problems in stochas-
tic control and communication theory. The Markov chain approximation
method is well suited for the computation of approximations to the opti-
mum filters and is the prototype of a large family of effective algorithms.
We first discuss the problem of numerically approximating the distribution
of x(t), an uncontrolled jump diffusion. Then, approximations to the opti-
mal nonlinear filter for a jump diffusion model with white noise corrupted
observations will be developed.

12. 7.1 Approximation to the solution of the Fokker-Planck


equation
We start with a few formal remarks. Let x( ·) be defined by

dx = b(x)dt + a(x)dw, x(O) = x.

Let C* denote the formal adjoint of the differential operator C of x( ·). Under
appropriate regularity conditions [49, 54, 60], x(t) has a density p(·, ·) and
it is the solution to the Fokker-Planck or "forward" Kolmogorov equation

( t)
Pt y,
= C* p(y, t) =!"
2 L.J
82 (aii(y)p(y, t)) _ " 8(bi(y)p(y, t))
8 -8 · L.J 8 · '
(7 1)
.
i,j y, YJ i Yt

with the initial condition p(y,O) = o(x- y), where o(·) is the Dirac delta
function; that is, x(O) is concentrated at the initial point x. There is an
analogous (partial differential integral) equation for the jump diffusion case

dx = b(x)dt + a(x)dw + l q(x(t- ),"f)N(dtd"f). (7.2)

For many problems of interest, (7.1) has only a formal meaning, because
either the density does not exist or it is not smooth enough to satisfy (7.1).
Henceforth, when referring to the density, we mean the weak sense density
or (equivalently) the distribution function. We next discuss the problem
of the approximate calculation of the density. For numerical purposes, it
is usually necessary to work in a bounded state space. The state space G
might be bounded either naturally or because a bound was imposed for
numerical reasons. For example, the process dynamics can be such that the
process never leaves some bounded set, even without reflection or absorp-
tion: The process might be reflected back from the boundary of G, or it
might be stopped or killed once it leaves the interior G0 • The exact form
of the process is not important here, provided only that we work with ap-
propriately consistent approximations. Because the process is uncontrolled,
12.7 Nonlinear Filtering 341

we drop the control parameter in the notation for the approximating tran-
sition probability ph,lJ. For purposes of simplicity in the exposition and
because the boundary condition plays only a secondary role, we make the
assumption in the general form (A7.1) below. The details can be filled in by
referring to the weak convergence results for the various cases of Chapters
10 and 11.

A7.1. For~> 0 and h > 0, the function ph,lJ(x, y) is a transition function


for an uncontrolled locally consistent explicit Markov· chain approximation
to a diffusion or jump diffusion x(·) of the type discussed in Section 12.2.
The process x( ·) might be constrained to the bounded set G by a reflecting
boundary condition, or it might be killed on first exit from CO. The contin-
uous parameter interpolation ~h,lJ (-) converges weakly to x( ·) as h, ~ -t 0.

Let ph,lJ (x, n~, y) denote the n-step transition function. Then, as (~. h) -t
0 and n8 -t t ~ 0, we have

ph,lJ(x,n8,·) -t p(x,t,·)

in the sense that for any bounded and continuous real valued function if>(-},

L </J(y)ph,lJ(x, n8, y) -t J</J(y)p(x, t, y)dy. (7.3)


'Y

This is just a consequence of the weak convergence. Thus, explicit Markov


chain approximations of the type introduced in Section 12.2 can be used to
get approximations to the weak sense density under quite broad conditions.
It follows from Section 11.3 that the limit of any weakly convergent subse-
quence of invariant measures for the chains {~~· 6 , n < oo} is an invariant
measure of x(·). The alternative implicit chains {(~·lJ,n < oo} defined in
Section 12.4 can also be used.

12. 7.2 The nonlinear filtering problem: Introduction and


representation
A standard nonlinear filtering problem [51, 87, 116, 158] uses a jump
diffusion process model x( ·) and where the data available at time t is
=
Y(t) {y(s), s ~ t}, where

dy = g(x)dt + dwo, (7.4)


where g(·) is bounded and continuous and w0 (-) is a standard Wiener pro-
cess which is independent of x( ·). We will develop a computational approx-
imation to this filtering problem. Condition (A7.1) will be assumed. First,
we will define the optimal filter via the so-called representation theorem.
342 12. Finite Time and Filtering Problems

Then the optimal nonlinear filter for an approximating process will be de-
fined, and finally the numerical method will be given. The development is
an extension and simplification of that in [90].
For¢(·) an arbitrary bounded and continuous real valued function, define
the conditional expectation operator Et by

Et¢(x(t)) = E[¢(x(t))!Y(t)].
One of the most important results in the theory of nonlinear filtering is the
so-called representation theorem, which is a "limit" form of Bayes rule. We
will use it in the form which was used in the original derivation of the filter
[87], which is particularly convenient for the types of approximation and
weak convergence methods which will be used. That reference derived the
result for the diffusion case. The pure jump process (no diffusion compo-
nent) case was first derived by [136] and [156]. More modern developments
via measure transformation and martingale techniques are in [51, 116]. but
the representations obtained by those techniques are the same as used here
for the processes of concern here.
Let i(·) be a process with the same probability law as x(·) has, but which
is independent of (x(·), y(·)). Define

R(t) = exp [1t g(x(s))'dy(s)- ~ 1t !g(x(s)Wds] .


Then we have the representation [87]

E ¢( (t)) = E[R(t)¢(i(t))!Y(t)] (7.5)


t X E[R(t)!Y(t)] .

Except for a few special cases, the evaluation of (7.5) is not a finite
calculation. The best known cases where the calculation is (essentially)
finite are (i) the Kalman-Bucy filter, where the process is not reflected or
killed, g( ·) and b( ·) are linear functions of x, q( x, 'Y) = 0, and a(·) does not
depend on x and (ii) x(·) is a finite state Markov chain. Numerical methods
for the approximate evaluation of the conditional distribution were given
in [90] and related variations were in [39] and [92]. Robustness (locally
Lipschitz continuity) of the numerical approximations with respect to the
observation process was shown in [92] and this property is enjoyed by the
algorithm to be described in Subsection 12.7.3.
It is important to keep in mind that all approximations to the nonlin-
ear filtering problem are actually approximations to some representation
of Bayes rule. Other approximations to Bayes rule for the problem at hand
might lead to preferable procedures in particular cases. The procedures to
be described below are of the same type as in these references, but can use
more general approximating processes. The basic idea depends on two facts.
First, if the signal x( ·) were a finite state discrete time Markov chain, then
12.7 Nonlinear Filtering 343

the conditional distribution can be obtained in principle by a finite calcula-


tion. Then, if the Markov chain is a locally consistent approximation to x( ·)
and the actual physical data from (7.4) is used, then the weak convergence
results from previous sections can be used to show the convergence of the
approximation to the true filter.
We will next state the analogue of (7.5) for a finite state Markov chain.
Then we use (A7.1) to approximate the process x(·) by a locally consistent
Markov chain such as that defined in Section 12.2. The approximating filter
to be defined in Subsection 12.7.3 will be that for the approximating chain,
but will use the actual physical observations (7.4). It will turn out that the
resulting filter will converge to the values given by (7.5) as h and d go to
zero. Either the explicit or the implicit approximations can be used.

The Optimal Filter for a Markov Chain Signal Process. Let {~n, n <
oo} be a finite state Markov chain with one step transition probabilities
p(x, y). Let v be a positive real number and suppose that {1/Jn, n < oo} is a
sequence of mutually independent normally distributed random variables
with mean zero and covariance vi, and which is also independent of the
{~n,n < oo}. Suppose that we observe the white noise corrupted data
Yn = g(~n) + 1/Jn at time step n, for some bounded function g(·). Define
Yn = {Yi, i ~ n} and the conditional distribution

We now use Bayes rule to define a convenient recursive formula for Qn (x).
Let the expression P{Ynl~n = x} = P{ynl~n = x,~n-1 = y} denote the
conditional (normal with mean g(x) and covariance vi) density of the ob-
servation at the value Yn· Note that

Qn(x) = EY P{en = xlen-1 = y,yn}Qn-1(Y)


EY P{Ynlen = x,en-1 = y}P{en = xlen-1 = y}Qn-1(Y)
normalization
Substituting in the normal conditional density function of the observation
Yn. we can rewrite the last expression as

1
Eyexp [ vg(x)'Yn- 1
2VIg(x)l 2 ] p(y,x)Qn-1(Y)
-
(7.6)
normalization
where, in both cases, the normalization is just the numerator summed over
x.

The Optimal Filter for the "Explicit" Chain {~~· 6 ,n < oo} of
Sections 12.1 and 12.2. Let us specialize the result (7.6) to the chain
introduced in Sections 12.1 and 12.2. Thus, ~n is replaced by ~~· 6 , and the
344 12. Finite Time and Filtering Problems

transition probability p(x, y) is replaced by ph•6(x, y). Let the observation


at the n-th time step be
~y!· 6 = g(~!· 6 )8 + [w0 (n8 + 8) - wo(n8)],
where g(·) and wo(·) are as in (7.4). Thus, v = 8 and g(x) = g(x)8. Define
Y h,6 = {~y~· 6 i < n} and
n " ' - '
Q~·6(x) = P{~!·6 = xiY~·6}.
Define the expression

Rh•6(x, ~y!· 6 ) = exp [g(x)' ~y!• 6 - ~lg(x)l 2 8].


Then we can write (7.6) as

Q- h,6 (x) - Ly Rh,6(x, ~y~·6)ph,6(y, x)Q~~~ (y)


(7.7)
n - normalization '
where the normalization is the numerator summed over x, Q~· 6 (x) is the a
priori probability that ~~· 6 = x, and is a weak sense approximation to the
weak sense density of x(O). One can also write (7.7) in the unnormalized
form
ti!•6(x) = L
Rh,6(x, ~y!•6)ph,6(y, x)q~~~ (y), (7.8)
y

where q~· 6 (x) = Q~· 6 (x) and q~ 6 (x) equals Q~ 6 (x) times a normalizing
factor which depends only on the data Y!:; 6 • Note that (7.8) can be divided
into the two steps: first update the effects of the dynamics as in
~.::>h,6(y, x)q~~~ (y),
y

and then incorporate the observation by multiplying by Rh· 6 (x, ~y~· 6 ).

A Representation of the Form of (7.5). For the purposes of show-


ing convergence, it is useful to put (7.7) and (7.8) into the ''functional"
form (7.5). Let {e~· 6 ,n < oo} be a Markov chain with the same transition
probabilities ph• 6 (x, y) and which is independent of {~~· 6 , ~y~· 6 , n < oo}.
Define

ii!:·' = exp [t. g(e'·')' ay~~ - ~ t.IY(e'·')l'•]·


Then (7. 7) can be written as
6 E[Ia~··=x}R~·61Y~·6]
{Jhn, (x) =
E(~' 6 IY~· 6 ]
(7.9)
q~·6(x)
Ly q~·6(y),
12.7 Nonlinear Filtering 345

with the obvious unnormalized version.

The Filter for the "Implicit" Method. Recall the definition of the
chain {(!· 6} and its one step transition probabilities ph• 6(x, y) defined at
the end of Section 12.4. To get the filter for the implicit method simply
replace theph• 6(x,y) in (7.7), (7.8), or (7.9) by ph•6(x,y).

12. 7.3 The approximation to the optimal filter for x(·), y(·)
The numerical approximation to the optimal filter (7.5) is just either (7.7),
(7.8), or (7.9) with the actual physical obseruations y(no+o) -y(no) used in
place of ~y~· 6 . Both (7.7) and (7.8) provide recursive formulas which can be
used for the actual computation, and one representation for the recursion
is in [90, pp.132-133]. The initial condition Q~· 6 (-) is any approximation
to the a priori weak sense density of x(O) and which converges weakly to
that density as h and o go to zero. As noted above, these equations can
be used for the implicit method also. Note the two step (update dynamics,
incorporate the observation) division as below (7.8).

The Convergence Proof. Consider the representations (7.7), (7.8) or


(7.9). For t =no, define

Let eh,.s (·) be the continuous parameter interpolation defined in Section


12.2. Let ¢( ·) be a continuous and bounded real valued function. The value
that the numerical approximation (7.7) or (7.9) gives for the estimate of
the conditional expectation EtCI>(x(t)) is the ri.e:ht hand side of
13
Controlled Variance and Jumps

In the models dealt with in Chapters 1Q-12, such as (10.1.1), (11.1.1) and
their time varying forms in Chapter 12, neither the noise coefficient u( ·) nor
the jump coefficient q(·) depended on the control. However, the control de-
pendent forms are treated by methods which are very similar to those used
in Chapters 1Q-12, and identical convergence results are obtainable under
the natural extensions of the conditions used in those chapters. Local con-
sistency remains the primary requirement. Recall that relaxed controls were
introduced owing to the issue of closure: A bounded sequence of solutions
xn(-) under ordinary controls un(-) (for either an ordinary or a stochastic
differential equation) would not necessarily have a subsequence which con-
verged to a limit process which was a solution to the equation driven by an
ordinary control. But, if the controls (whether ordinary or relaxed) were
represented as relaxed controls, then the sequence of (solutions, controls)
was compact, so that we could extract a convergent subsequence and the
limit solution was driven by the limit relaxed control. While the introduc-
tion of relaxed controls enlarged the problem, it did not affect the infimum
of the costs or the numerical method. It was used purely for mathematical
purposes, and not as a practical control.
It will be seen, via simple examples, that analogous issues of closure or
compactness arise when the variance or the jump is controlled, even with
the use of relaxed controls. The problem now is not the relaxation of the
notion of control, but of the driving Wiener or Poisson processes, and leads
to the so-called martingale measure and relaxed Poisson measure. These
concepts are used for mathematical purposes only. They allow the desired
closure and do not affect the infima of the cost functions or the numerical
348 13. Controlled Variance and Jumps

algorithms.
To simplify the development, we will concentrate on the discounted cost
problem with no stopping time and a reflecting boundary. But extensions
to impulse control, optimal stopping or absorbing boundaries and to the
various combinations of these are all straightforward.
We will start with the variance control problem and show why the driv-
ing Wiener process concept is inadequate in its classical form for the con-
vergence and approximation. Then the extension, the so-called martingale
measure, will be developed. It will then be. seen (as in [94, Section 8]) that
all of the approximation and limit theorems of the previous chapters can
be carried over. The use of this extension of the driving process does not
actually alter the model, from the point of view of applications. Then, the
problem with controlled jumps will be treated by an analogous method,
adapted from [100, Chapter 11]. Problems with both controlled variance
and jumps can also be treated, with the obvious combinations of the meth-
ods which are used for each alone. Throughout the section, it is assumed
that (A10.1.1)-(A10.1.4) hold, unless noted otherwise, as do the assump-
tions on the set G and boundary reflection directions in Section 5. 7.

13.1 Controlled Variance: Introduction and


Martingale Measures
13.1.1 Introduction: The problem of closure
Consider the case where the control also affects the noise, so that (11.1.1)
is replaced by

x(t) = x + 1t b(x(s),u(s))ds + 1t u(x(s),u(s))dw(s) + z(t). (1.1)

The jump component does not affect the development of the central issues
concerning variance control, and it will not be included in this section.
Define a(x,a) = u(x,a)u'(x,a) = {aij(x,a)}.
The following examples will illustrate the problems and processes which
arise when we take limits of sequences of solutions corresponding to either
ordinary or relaxed controls.

Example 1. For arbitrary ai E U, let un(-) denote the control which


oscillates between a 1 and a2 on successive intervals of length 1/n. Let m n ( ·)
and xn(-) denote the associated relaxed control and solution, respectively.
Thus
13.1 Controlled Variance: Introduction 349

The set (xn(·), zn(· ), mn(·), ion(·)) is tight and we need only characterize the
weak sense limit process. The differential operator of the solution process
under un(-) and at X E (fJ and arbitrary t is

cun f(x) = ,~(x) L b(x, a)mf(a) + ~tr [!xx(x) (L a(x, a)mf(da))] .

Take a weakly convergent subsequence of (xn(-),zn(·),mn(·)), with weak


sense limit denoted by (x(·),z(·),m(·)). Then, by standard arguments, one
can show that the limit solves the martingale problem with differential
operator (written in differential form)

where z(·) is the reflection process. There are two ways of representing
the limit as a stochastic differential equation, depending on how the term
L:ia(x,ai)/2 is factored. The set (x(·),z(·),m(·)) can be represented in
terms of a single Wiener process in the sense that 1 there is a standard
Wiener process w(·), with respect to which the other processes are nonan-
ticipative and such that

x(t) = x(O) +-
2
11t0
[b(x(s),at) + b(x(s),a2)] ds
1 rt 1/2
+ ../2 Jo [a(x(s), a I)+ a(x(s), a 2)] dw(s) + z(t),

where any "measurable" square root function can be used.


Owing to the fact that the second order part of the limit differential
operator can be split into two parts, each having a different control value,
we can also represent the limit in terms of a pair of mutually independent
Wiener processes Wi ( ·), i = 1, 2, in the sense that

x(t) = x(O) +
1
2 Jo
t [b(x(s), at)+ b(x(s), a 2 )] ds

1 t 1 t
+ ../2 Jo a(x(s),at)dw1(s) + ../2 Jo a(x(s),a2)dw2(s) +z(t),

where (x(·), m(· )) is nonanticipative with respect to the wi(·), i = 1, 2.

1 Again, possibly having to augment the probability space by the addition of


a Wiener process which is independent of the other processes. In the sequel, it
will be assumed that such an augmentation will be done when needed, and will
not usually be explicitly stated.
350 13. Controlled Variance and Jumps

Example 2. Now, suppose that unO alternatively takes values a:i, i =


1, ... , k, on time intervals which go to zero as n --t oo, and that there
are nonanticipative processes Pi (·) such that the total amount of time on
the interval [0, t] that the control uno takes the value O:i converges to
J~ Pi(s)ds as n --t oo. Then the limit differential operator is (written in
differential form)

d (.Cf(x)) = f~(x) LPi(t)b(x, a:i)dt

(t.
i

+ ~tr k·.(x) P;(t)a(x, n;))] dt + f~(x)dz.


{1.3)
As for Example 1, (x(·), z(·), m(·)) can be represented in terms of a single
Wiener process w( ·) as

x(t) = x + 1t ~Pi(s)b(x(s),a:i)ds
+ 1 ~Pi(s)a(x(s),a:i)
t' [ ]1/2 dw(s) + z(t),
{1.4)

with any "measurable" choice of the square root process used. Again, owing
to the way that the second order part of the limit differential operator splits
into k parts, each part having a different control value, we can also represent
the solution in terms of a set of mutually independent vector-valued Wiener
processes wi(·), i = 1, ... , k, as

x(t) = x+ lot LPi(s)b(x(s),a:i)ds+ lot~ JPJS}u(x(s),a:i)dwi(s)+z(t).


' ' (1.5)
Note that the limit relaxed control m( ·) is defined by

Remarks on the Examples. The representation {1.4) might seem simpler


than the representation {1.5) in that there is only a single Wiener process.
But, the square root of the sum creates problems in analysis. Even though
u(x, a:) satisfies a uniform {in a:) Lipschitz condition in x, the square root
term in {1.4) will not in general be Lipschitz continuous in x. It is also diffi-
cult to approximate with the representation {1.4). Additionally, the role of
the control is vague in {1.4), being contained in both m(·) in the drift term
and inside the term whose square root is being taken. These problems only
become worse when the control can take a continuum of values, since then
13.1 Controlled Variance: Introduction 351

the limit m( ·) does not appear in a recognizable form in the variance term.
The representation {1.5), despite the use of a larger number of Wiener pro-
cesses, does not suffer from these disadvantages, although appearance of the
Pi (·) in square root form is still awkward. The limit processes x( ·) defined
by {1.4) and {1.5) are the same, despite the different representations.
The question concerns the representation of the family of possible pro-
cesses which are limits of solutions to {1.1), analogously to what was done
with relaxed controls in the previous chapters. It is clear that the repre-
sentation of the controlled process in the form {1.1) is not adequate. Keep
in mind that we are concerned with convenient representations. The key to
resolving the problem of representation lies in the fact that the differential
operator of the limit process always has the form (written in differential
notation)

(£f(x)) = L b(x, a)mt(da)dt

+ ~tr [!:u:(x) La(x, a)mt(da)] dt + f~(x)dz.


{1.6)

To prepare for what follows, let us write {1.5) in an equivalent form. Let
{.1"t,t ~ 0} denote the filtration engendered by the processes {x(·),z(·),
m(·),wi(·),i ~ k) in {1.5). Define the (vector) measure-valued .1"t-martin-
gale process M(·) by its values on the sets A E B(U):

M(A,dt) = L v'PJt}dwi(t)I{a;EA}· {1.7)

Note that the quadratic variation process of the measure-valued martingale


process is the measure-valued process m(·)I. Now, rewrite (1.5) in terms
of M(·) as:

x(t) = x +lot L b(x(s),a)m(dads) +lot L a(x(s),a)M(dads) + z(t).


{1.8)
It is seen that the number of Wiener processes that are needed for the rep-
resentation depends on the number of values that the control takes. But the
form {1.8), which represents the stochastic integral in terms of a measure-
valued martingale which is obtained from those Wiener processes, suggests
a convenient representation, no matter how many values the control takes.
This leads to the notion of a martingale measure.

13.1.2 Martingale measures


Let {.1"t, t ~ 0} be a filtration on the probability space and m( ·) a re-
laxed control which is adapted to the filtration. First, let M(·) be a (real)
measure-valued continuous random process defined on the probability space
352 13. Controlled Variance and Jumps

with the following properties. For each U E B(U), M(U, ·) is a continuous


Ft-martingale which satisfies M(U, 0) = 0, and EM(U, t) 2 < oo, for each
t ~ 0. If Ui E B(U) are disjoint, then M(U1 U U2, ·) = M(Ut, ·) + M(U2, ·),
and the two components are orthogonal martingales. Let a(·) haver rows
(the dimension of x) and p columns (the dimension of the Wiener pro-
cess w(·) in (1.1)). In general, M(·) will be vector-valued. Its real-valued
components Mi (·), i = 1, ... , p, are measure-valued martingales, with or-
thogonal components: I.e., the martingales defined by Mi(A, ·), Mj(B, ·)
are orthogonal for all A, B E B(U) and all pairs i =1- j. The quadratic
variation process is (Mi(A, ·), Mi(B, ·))(t) = oijm(A n B, t), where m(·) is
the relaxed control. The process defined by (1.6) is a simple example of a
martingale measure. In fact, it is the canonical example, if the measure is
supported by a finite number of points.

Tightness. Let Mn(-) (for each n) and M(·) be martingale measures (with
respect to the filtrations which they engender) and with quadratic variation
process mn(·)I (respectively, m(·)I) where the mn(-) and m{·) are relaxed
controls. The weak topology is used on the space of martingale measures.
Thus, (Mn(·), mn(·)) converges weakly to (M(·), m(·)) if and only if

{lot Lfi(s,a)Mn(dads),i}:::} {lot Lfi(s,a)M(dads),i}


for each finite, bounded and continuous set {h(·)}. By the compactness of
U, any sequence {Mn(·),mn(·)} is tight.

Stochastic Integrals. For full information on martingale measures, see


[50]. They were first used in [94] for the problem of variance control and
for the representation of solutions as done here, and to help prove the
convergence of numerical approximations. Stochastic integrals are defined
with respect to martingale measures in the same way that they are de-
fined with respect to continuous martingales. Let the probability space be
(O,P,F), with a filtration {Ft,t ~ 0}, and where F = .1="0 0 • Let M(·)
be an Ft-martingale measure with quadratic variation process m(·)I. De-
fine the predictable a-algebra Fp to be the smallest sub-a-algebra of
F x B([O,oo)) which measures the real (or vector) valued processes which
are piecewise constant, Ft-adapted and left continuous. A process which
is Fp-measurable is said to be predictable. Let the f(·) below be real (or
vector) valued functions of (w, t, a) which are measurable on the product
of the predictable a-algebra Fp and B(U).
Start by defining the stochastic integral

lot L f(s, a)M(dads), (1.9)

of processes f(·) which are piecewise constant, bounded, and depend on


only a finite number of values of a. The quadratic variation of the process
13.1 Controlled Variance: Introduction 353

defined by {1.9) is J~ fu a(s, a)m8 (da)ds. In this case, the process defined
by {1.9) is equivalent to a stochastic integral with respect to a finite number
of Wiener processes. Then, extend to f(-) still depending on only a finite
number of values of a but satisfying

{1.10)

with probability one for each t. Finally, approximate an f(·) with a general
dependence on a by a sequence of simple functions, analogously to what is
done for the real valued martingale case. (See the remarks on the proof of
Theorem 1.2.)

Stochastic Differential Equations. The existence of a (weak sense) so-


lution to {1.8) for each pair (M(·), m(·)) follows from Theorem 1.1. Suppose
that:

AI. I. For each pair (M (·), m( ·)) there is a unique weak sense solution to
{1.8).

Condition (ALl) can be verified under practical conditions on b( ·), o{)


and on the reflection directions. See [100, Chapter 3], and one set of suf-
ficient conditions will now be given for strong sense existence and unique-
ness. For the problem with an absorbing boundary, where we are concerned
only with the solution up to the point of absorption, it is sufficient that
b(x, a), a(x, a) be continuous and Lipschitz continuous in x, uniformly inU,
a compact set. The usual Picard iteration method gives the desired result.
For the absorbing boundary case, the situation is similar to the case
where M(·) is replaced by a Wiener process. Consider a case which is typ-
ical in heavy traffic modeling of queueing systems. Then, keep the above
Lipschitz condition and let G be a bounded hyperrectangle which is the
closure of its interior. Fix a corner of G, and let di, i:::; r, denote the reflec-
tion directions on the adjoining faces. For simplicity, assume the corner is
the origin. (In the general case, 1 - Qii below will be replaced by -1 + Qii if
-ei is an inward normal toG at the corner. The spectral radius condition
will still be applied to lqii I·) Normalize such that

The 1 - Qii is in the i-th place. Suppose that the spectral radius of the
matrix {lqiil} is less than unity. Consider the problem ¢i(t) = ,Pi(t)+zi(t),
where ,Pi(·) are in nr[o, oo) and zi( ·) is the reflection term. Then, [42],[100,
Section 3.4] there is constant C, not depending on the 1/Ji{ ·) such that
354 13. Controlled Variance and Jumps

Using this, the conditions on (b( ·), u( ·)) and the Picard iteration establishes
that, for any (M(·),m(·)), there is a strong sense unique solution to (1.8).
The process defined by (1.8) is the natural representation for the prob-
lem where the variance is controlled. Suppose that (x( ·), m( ·)) solves the
martingale problem with differential operator defined by (in differential

L
form)
d (em f(x)) = c~ f(x)mt(da)dt + f~(x)dz, (1.11)

where co. is the operator with control fixed at .a. Then (by augmenting
the space if necessary, by the addition of an "independent process," ([50])),
there always is a martingale measure such that (1.8) holds.

Weak Convergence. Let (xn(-),zn(-),mn(-),Mn(·)) solve (1.8), where


Mn ( ·) is a martingale measure with respect to a filtration {:F["}, with
quadratic variation process m n (-), a relaxed control, and (xn (-), zn ( ·)) are
.1't-adapted. Then we have the following theorem.

Theorem 1.1. Assume the conditions in the introduction of the chapter.


The set (xn(-),zn(-),mn(-), Mn(·)) is tight. Let (x(·),z(·),m(·),M(·)) de-
note the limit of a weakly converyent subsequence, and let {Ft, t ~ 0} denote
the filtration which it engenders. M ( ·) is a martingale measure (with respect
to {.1't, t ~ 0} ) and quadratic variation process m(-). The limit satisfies
(1.8), where (x(·),z(·)) are nonanticipative with respect to {Ft,t ~ 0}. The
boundary estimates of Theorem 11.1.1 hold.

Remarks on the Proof. Given ~ > 0, we can write

xn(t) =X+ lot Lb(xn(s),a)mn(dads)

+L
i~<t
1 1u(xn(i~),
iLH~

i~ u
a)Mn(dads) + zn(t) + p~(t)
(1.12)
where, for each T < oo,
lim supEI supp~(s)l
~-tO n s~T
= 0.
By the weak convergence, (1.12) holds with the superscript n dropped. The
proof of nonanticipativeness follows the lines used in the proof of Theorem
10.1.1, and the details are omitted. Given the nonanticipativeness, let~---*
0 to get the representation (1.8).

13.1.3 Approximations of an optimal control and convergence


The cost function is assumed to take one of the forms in (11.1.3). Keep in
mind that the cost function depends on the pair (m(·),M(·)) and not just
13.1 Controlled Variance: Introduction 355

on m(·). Theorem 1.1 implies that there is an optimal control in the class of
models (1.8). All of the results remain true if a stopping time or boundary
absorption is added, with the associated conditions from Chapters 10 or 11
used, and analogously for the finite time problems of Chapter 12.
Analogously to what was done in Sections 10.2 and 10.3, we need to
know that

inf W(x, m) = inf W(x, m), (1.13)


m, model {1.1) m, model (1.8)

and that any process of the type (1.8) and its associated cost can be arbi-
trarily well approximated by a classical controlled diffusion process and its
associated cost. This is implied by the next theorem.

Theorem 1.2. Assume the conditions in the introduction of the chapter


and (ALl). Given a solution (x(·),z(·),M(·),m(·)) to (1.8), there are a
sequence of ordinary controls u 6 ( ·) each taking a finite number of values
a~, i ~ k6, and being piecewise constant, standard Wiener processes w 6( ·),
and associated solutions (x 6(·),z 6(·)) to (1.1) where (x 6(·),z 6(·),u 6(·)) is
nonanticipative with respect to w 6( ·), and such that (x 6( ·), z 6( ·)) converges
weakly to (x(·),z(·)) and

(1.14)

Remarks on the Proof. Start by approximating (x(·),z(·),M(·),m(·))


by concentrating the measure on a finite number of points. Given 8 > 0,
the first approximation will be of the form

x 6(t) = x + ~t L b(x6(s), a)m~(da)ds


+ fo L
{1.15)
u(x6(s), a)M6(dads) + z6(t),

where m~ (·) is concentrated on a finite number of points. Hence, M 6 ( ·)


can be written in terms of a finite number of Wiener processes and control
values.
The approximation is constructed as follows. Let Uf, i ~ k6 < oo, with
a~ E Uf, be a disjoint covering of U, where the maximal diameter of the sets
goes to zero as 8 ---* 0. Then M (Uf, ·), i ~ k6, are orthogonal martingales,
with quadratic variation processes m(Uf, ·)I, i ~ k6, respectively. There
are mutually independent standard vector-valued Wiener processes w~ (·)
such that we have the representation

6
M(Ui, t) = 1
0
t
6
[ms(Ui )]
1/2 6
dwi (s). {1.16)
356 13. Controlled Variance and Jumps

Let M(\),m6 (-) denote the measures defined by M 6 ({at},t) = M(Uf,t)


and m 6 ({at},t) = m(Uf,t). We then obtain an approximation of the form
{1.15):

x 6 (t) =x+ L loft b(x6 ,a~)ms(Uf)ds


i
{1.17)
{t 1/2
+ ~ lo a(x6 (s), a~) [ms(Uf)] dw~(s) + zd(t).
'
The sequence (x 6 (·), z6 (·), M 6 (·), m 6 (·)) converges weakly to the limit (x(·),
z( ·), M ( ·), m( ·)), and this implies that the weak convergence and expres-
sions and {1.14) hold for (x 6 (·),z 6 (·)). Given the approximation {1.17), we
can further approximate the controls by ordinary controls which are piece-
wise constant, which, in turn, allows us to collapse the k.s Wiener processes
into one and leads to the weak convergence and {1.14).

Convergence of the Markov Chain Approximation. Let 1/Jh (·) be the


continuous parameter interpolation of a Markov chain approximation, with
control uh(-) whose relaxed control representation is mh(-).

Theorem 1.3. Assume the conditions in the introduction of the chapter


and {A1.1). Let 1/Jh(-) be locally consistent with {1.1), with the step sizes go-
ing to zero ash --t 0. Then (1/Jh( ·), zh( ·), mh(·)) is tight. Let (x(·), z( ·), m( ·))
denote a weak sense limit, and suppose that it engenders a filtration {:Ft, t 2::
0}. Then (with possibly an augmentation of the probability space), there is
a martingale measure M (·), with respect to {:Ft, t 2:: 0}, and with quadratic
variation process m(·) such that {1.8) holds, where (x(·),z(·)) is nonantic-
ipative.

Remark on the Proof. The tightness follows from the results of Chapter
11. Recall the definition of wh(-) in {10.4.5). Define the measure valued
process Mh(·) by

-h
M (A, t) =
r
Jo h
I{uh(s)EA}dw (s),

for A E B(U). Then, (1/Jh(·),zh(·),Mh(·),mh(·)) is also tight and the limit


(x(·), z(·), M(·), m(·)) of any weakly convergent subsequence satisfies {1.8).

Theorem 1.4. Assume the conditions in the introduction of the chapter


and {Al.1). Let 1/Jh(·) be locally consistent with {1.1), with the step sizes
going to zero ash --t 0. Then
{1.19)

Remark on the Proof. The previous theorem implies that lim infh Vh(x)
13.2 Controlled Jumps 357

;::: V(x). Fore > 0, let (x(·),z(·),M(·),m(·)) be an e-optimal solution.


Approximate it as in Theorem 1.2, by use of a finite dimensional Wiener
process and a finite valued and piecewise constant control. Then proceed
as in Theorem 10.5.2.

13.2 Controlled Jumps


13.2.1 Introduction: The problem of closure
We now turn to the problem when the jump is controlled. For a filtration,
{:Ft, t;::: 0}, let w(·) be a standard :Ft-Wiener process, N(·) an :Ft-Poisson
measure and u(·) (the control) an :Ft-predictable and U-valued process.
The jump rate of N(·) is A< oo and the jump distribution is II(·), where
the jumps are confined to a compact set r. The system model of concern

ft 1t
is
x(t) = x + b(x(s), u(s))ds + u(x(s))dw(s)

+ 1£
lot o
q(x(s-),-y,u(s))N(dsd-y) +z(t).
(2.1)

Note how the control affects the jump. The jump still occurs at random,
and the controller does not know the jump times until they occur. Thus,
the jumps cannot be controlled directly, but only via the overall control
policy. The cost functions (11.1.3) will be used. But, as in the previous
section, all of the results remain true if a stopping time or boundary ab-
sorption is added, with the associated conditions from Chapters 10 or 11
used, and analogously for the finite time problems of Chapter 12. Also,
variance control can be combined with jump control.
Such controlled jumps have arisen in telecommunications, and an exam-
ple from a polling problem where some queue is occasionally unavailable to
be served is in [4, 100]. The example concerns a wireless communications
system where the sources are sending data which is created in a random
and bursty way, buffered until transmitted, and the sources can be occa-
sionally unavailable to the fixed base station antenna due to their physical
movement. The state is the total amount of work that is in the buffers of
all of the sources and the control policy is the balance of the total buffered
work between the sources. The jump is due to the increase in work if one
source becomes unavailable for a period of time which is longer than what
is needed to handle all of the work in the other available sources plus what
arrives during that interval.
The issues of convergence are similar to those that arose with variance
control in Section 1, and which led to the introduction of the (control
dependent) Martingale measure as the basic driving process. If m n ( ·) is
a sequence of admissible relaxed controls with corresponding solutions
(xn (-), zn (·)), then there might be a (weakly) convergent subsequence of
358 13. Controlled Variance and Jumps

(xn(-), zn(·), mn(·)) whose limit does not satisfy (2.1) for some Wiener pro-
cess, Poisson measure and admissible control u(·). Even in the relaxed con-
trol framework, the best way of representing the limit controlled jump term
is not a priori clear: The limit of the distributions of the jumps depends on
the limits of the set of control values at the random times of the jumps. The
derivative of the relaxed control un(t) is a measure that is concentrated at
un(t). But the derivative mtO of the limit relaxed control is defined only
almost everywhere, and is not necessarily a limit of the mf (·).
As done with both relaxed controls and martingale measures, to get the
desired closure or compactness, it is necessary to enlarge the model. This
will be done by introducing the concept of relaxed Poisson measure as a
driving process to replace the Poisson measure. The relaxed Poisson mea-
sure functions essentially as did the martingale measure of the last section.
It enlarges the problem, enabling convergence theorems to be proved, but
it does not change the infimum of the costs. The following assumption will
be used.

A2.1. For each initial condition and admissible pair (w(·), m(·)), there is a
weak sense solution to the system (2.1) without the jumps, and it is unique
in the weak sense.

Let r denote the time of the first jump of

p(t) = 1t k -yN(dsd-y)

The solution to (2.1) on [0, r), whether or not the jump time and value are
controlled, does not depend on the value of the jump. As a consequence,
x(t) is well defined up till the time of the first jump, and so the distribution
of the first jump is also well defined. One can proceed in this way to define
the solution for all t for (2.1). By the continuity of q(x,-y,a) in (x,a) for
each value of -y, the distribution of the jump at r is weakly continuous in
the control u(r) and state values x(r-), and u(r) can depend only on the
system data up to timer-.

A Motivating Example. The following example, which is an analogue of


the examples of Section 1, will illustrate the underlying issue of "closure"
and guide us to the solution.
Suppose that the admissible control u(t) takes the two values ai, i = 1, 2.
Divide time into intervals of length 8 > 0, and divide each of these into
subintervals of lengths v18, v28, where v1 + v2 = 1. Use the control value
a 1 on [k8, k8 + v 18) and use a2 on [k8 + v18, k8 + 8), k = 0, 1, .... Let x 6(-)
(respectively, u 6 (·)) denote the associated solution (respectively, control) to
(2.1). Let If(s) denote the indicator function of the event that ai is used
13.2 Controlled Jumps 359

at times. Then the jump term in (2.1) takes the form

Let m 6 (·) denote the relaxed control representation ofu6 (·). Then mf(ai)
= If(t). Let 6---+ 0. Then m6 (·) converges weakly tom(·) with mt(ai) = v;.
The set (over b and the jump index) of all jumps is tight as is the set of
interjump sections of x 6 (·). Fix a weakly convergent subsequence of the
interjump sections and the jumps. Then (between jumps) the limit of the
chosen subsequence can be represented as

x(t) = x +lot Lb(x(s),a)m (da)ds + 1t u(x(s))dw(s) + z(t).


8 (2.3)

The limit of J 6 ( ·) along the chosen subsequence can be expressed in the


form
(2.4)

where Ni (·), i = 1, 2, are mutually independent Poisson measures with jump


distributions II(·) and jump rates ViA· The limit (w(·),N1 (·),N2(·),m(·))
is admissible.
The form (2.4) emphasizes the fact that the control value which affects
the jump is the result of a randomization. This type of approximation and
weak convergence analysis could be carried out for any number of values
of the control. It can also be adapted to the case where the fractions of the
intervals on which the ai are used are time dependent in a nonanticipative
way. For example, let m 6 ( ·) denote the relaxed control representation of an
Tt -predictable process u 6 ( ·) which takes only a finite number of values. Let
x 6 {·) denote the associated solution and let (x 6 ( •), m 6 (-)) converge weakly
to {x(·),m(·)). Let {Tt,t ~ 0} denote the filtration which this process in-
duces. Then2 there is a standard Tt-Wiener process w(·) and Tt-adapted
counting-measure valued processes Ni(·),i = 1,2, such that the set solves
(2.3) between jumps and the jumps are represented by {2.4), but where the
former jump rate ViA of Ni(·) is replaced by the random and time vary-
ing (always Tt-predictable) quantity Amt(ai)· Thus, the limit of the jump
term can be represented in terms of a set of {extended) Poisson measures
with jump rates depending on the derivative of the limit relaxed control.
The Ni(·), i = 1, 2, would not be independent, but the martingales defined

2 Perhaps requiring the augmentation of the probability space by a Wiener


process and Poisson measure which are independent of the other processes.
360 13. Controlled Variance and Jumps

by

1t k If(s)q(x6 (s- ), -y, ai)N(dsd-y)->-.1t k q(x6 (s- ), -y, ai)IT(d-y)m~(ai)ds


(2.5)
converge weakly to the processes

ft f q(x(s-),-y,ai)Ni(dsd-y)- >-. t f q(x(s-),-y,ai)IT(d-y)ms(ai)ds,


lo lr lo lr ·
(2.6)
which are orthogonal Ft-martingales.
There is an alternative representation of (2.4) which is sometimes use-
ful. Extend the Poisson measures (Ni (·), i = 1, 2) as follows. Let TI0 ( ·) be
Lebesgue measure on [0, 1], and let N(dsd-yd-y0 ) denote the Poisson mea-
sure with jump rate ).. and jump distribution IT(d-y)Tio(d-yo). Let the con-
trol take a finite number of values {ai,i ~ k}, and define J.Lo(t) = 0 and
f.Li(t) = E~=l mt(aj) fori~ 1. Then write the i-th summand in (2.4) in
the form

1t £1/hoE(~;- 1 (t),~;(t)]}q(x(s-
1
), -y, ai)N(dsd-yd-yo). (2.7)

The representation (2.7) yields a process (interjump and jump) with the
same probability distribution as (2.4). The form of (2.7) emphasizes, again,
that the actual realization of the jump value is determined by a random-
ization via the relaxed control measure. The representations differ only in
the realization of the randomization. The presence of the discontinuous in-
dicator function in (2. 7) does not affect the existence, uniqueness or the
approximation arguments, since it does not depend on the state. This rep-
resentation in terms of a set of mutually independent Poisson measures
works only if the control takes a finite or countable number of values.

Recapitulation. The above discussion suggests a generalization of the


concept of Poisson measure which would allow the use of a continuum of
control values within a well defined framework. In preparation for this, let
{Ft, t ~ 0}, w(· ), N(·) be as in the introduction to the section. Let u( ·) be an
arbitrary admissible control with relaxed control representation m( ·) and
define the measure valued process Nm(dsd-yda) as follows. Let r 0 E B(r),
and Uo E B(U). Then define Nm([O, t] x fox Uo) = Nm(t, fo, Uo) to be the
number of jumps of J~ fr -yN(dsd-y) on [0, t] with values in ro, and where
u(s) E U0 at the jump times s. The stochastic model can then be written
as

x(t) = x +lot Lb(x(s), a)m (da)ds +lot a(x(s))dw(s)


+lot L£q(x(s-),-y,a)Nm(dsd-yda) +z(t).
8

(2.8)
13.2 Controlled Jumps 361

The compensator of the counting-measure valued process Nm(·) is the


integral of
>.II( dr )mt (do: )dt (2.9)
in the sense that the processes defined by Nm(t, fo, Uo) - >.II(fo)m(t, Uo)
are Ft-martingales and are orthogonal for disjoint ro X Uo. This follows
from the facts that lu0 (u(·)) is progressively measurable, that

{t { lu0 (u(s))N(dsdr) = Nm(t, fo, Uo),


Jo lro
and, that the left hand side has compensator >.II(fo)m(t, Uo) (which follows
from lu0 (u(s)) = m 8 (Uo)). Furthermore, for bounded and measurable real-
valued functions¢(·), the process defined by

1t il </J(s, '' o:)Nm(dsdrdo:) -lot ll <P(s, /, o:)>.II(dr)m 8 (do:)ds


(2.10)
is also an Ft-martingale. Define

p(t) = 1t il <P(s,,,o:)Nm(dsdrdo:),

and let f(-) be a bounded and continuous real valued function. Then the
compensator for f(p(·)) is

A(t) = 1t il [f(p(s) + <P(s, /, o:))- f(p(s))] >.II(dr)m 8 (do:)ds

in the sense that f(p(t))- f(p(O)) = A(t) plus an Ft-martingale.


The set of integrands can be extended. The martingale property of
(2.10) holds if any real valued, bounded Ft-predictable process <Po(·) mul-
tiplies ¢(-). Any left continuous and Ft -adapted process is predictable.
Note that mt(Uo) is predictable for any Uo E B(U), by its definition
as the limit (for almost all w, t) of a sequence of predictable processes:
mt(Uo) = lima-7o[m(t, Uo)- m(t- 8, Uo)]/8.

13.2.2 The relaxed Poisson measure


The previous discussion exhibited the formulation, starting from the prim-
itives (w(·),N(·),m(-)). We are now in a position to develop the needed
extension of the Poisson measure, which is consistent with the motivating
discussion and the model (2.1).
Let us restart from the beginning. Let {Ft,t 2: 0} be a filtration and
w( ·) a standard Ft- Wiener process, and let m( ·) be an admissible relaxed
control. Let the (counting) measure valued process Nm(·) have the property
that for any Borel sets r o c r and Uo c U, the processes
Nm(t, fo, Uo) - >.II(fo)m(t, Uo) (2.11)
362 13. Controlled Variance and Jumps

are Ft-martingales, and are orthogonal for disjoint r 0 x U0 . This martin-


gale property and the fact that mt(·) is Ft-predictable constitutes the def-
inition of admissibility. Such Nm(·) will be called relaxed Poisson measures.
The martingale property and the fact that Nm(-) is a counting-measure val-
ued process specifies the distribution of Nm(·) uniquely. The weak topology
is to be used on the space of measures, whatever the type.
Write the stochastic differential equation with controlled jumps in terms
of the relaxed Poisson measure as

x(t) = x + 1t L b(x(s), a)ms(da)ds + 1t a(x(s))dw(s) + J(t) + z(t),


{2.12)

1t lrL
where
J(t) = q(x(s-),'"f,a)Nm(dsd-yda). (2.13)

Under the conditions in the introduction to the chapter and (A2.1), there
is a unique (weak sense) solution to (2.12) for each initial condition. The
bounds on the reflection terms given in Theorem 11.1.1 continue to hold.
For the motivating problem where the jumps were represented by either
of (2.4) or (2.7), the Nm6(dsd7da) defined above (2.8) (when setting m =
m 6 ) equals that define~ here. Furthermore, it follows from Theorem 2.1 that
Nm60 converges weakly to a relaxed Poisson measure Nm(·) associated
with the limit relaxed control m( ·). Also,

converges weakly to

1t [L q(x(s-), '"f, a)Nm(dsd'Yda),

and (x 6 (-), z6 (·), m 6 (·),Nm6(·)) converges weakly to (x(·), z(·), m(·), Nm(·)).
In general, ifthere are a finite number of points {ai, i :::; k} on which mt (-)
is concentrated for almost all w, t, then the jump processes can be rep-
resented in terms of k mutually independent and identically distributed
Poisson measures.

Implications of the martingale property. By the martingale property


of (2.11), for bounded and measurable </J{-)

1t l L <{J(s, 'Y, a)Nm(dsd7da)- .\ 1t l L <{J(s, '"f, a)TI(d7)m 8 (da)ds

is an J=t-martingale. This implies that the conditional (on :Ft) probability


of a jump (when <fJ(s, 7, a) = 'Y) in any interval [t, t + t5) is .\t5 + o(t5). The
13.2 Controlled Jumps 363

probability of more than one jump is o( o). It also tells us that the jump
distribution is II(·) and that the jump value (given a jump) is independent
of the time of the jump. If x(·) is an .1"t-adapted process with paths in
Dr[O,oo), then the process defined by

{t { { q(x(s-),"'f,a)Nm(dsd"'(da)
lo lr lu
-A 1[fu
t
q(x(s-),"'f,a)IT(d"'()m 8 (da)ds
(2.14)

is an .1"t-martingale. The associated jump distribution is q(x(s-),"'f,a)


where "Y is distributed as IT(d"'f) and a as m 8 (da) independently. Thus, the
relaxed control plays the role of a randomization.

Next, let {J?, t ~ 0}, wn(-), mn(·), Nm.. (·)} be a sequence of filtrations,
standard J?- Wiener processes, admissible controls and relaxed J?- Pois-
son measures. We have the following limit theorem.

Theorem 2.1. Under the conditions in the introduction to the chapter and
(A2.1), the set

is tight. The limit of any weakly convergent subsequence satisfies (2.12) and
(2.13). Let {Ft, t ~ 0} denote the filtrotion induced by the limit. Then w(·)
is a standard .1"t- Wiener process, m(·) is admissible and NmO is a relaxed
Poisson measure with compensator process defined by (2.9).

Comments on the Proof. Only a few comments will be made. Since


{ENm .. (·)} is tight the set {N~(-)} is also tight [95, Theorem 1.6.1]. The
set {mn(-), wn(·)} is always tight. The tightness of {zn(·)} follows from the
arguments in Subsection 11.1.2 (see also [100, Theorem 3.6.1]). The sets of
interjump sections are tight as are the sets of jumps. Suppose that (abus-
ing terminology) n indexes a weakly convergent subsequence, with limit
denoted by (x(·),z(·),w(·),m(·),Nm(·)) and let {.1t,t ~ 0} denote the fil-
tration engendered by the limit process. The nonanticipativity, the Wiener
and martingale properties and the admissibility of m( ·) follow by weak con-
vergence arguments of the type used in Theorems 10.4.1 and 11.1.2. Note,
in particular, that Nm(·) is a relaxed .1"t-Poisson measure associated with
m(·). Since q(·) is bounded and is continuous in (x,a) for each value of"'(,

converges weakly to J(·). Now, piece the interjump limits and jump limits
together to get (2.12) and (2.13).
364 13. Controlled Variance and Jumps

13.2.3 Existence and approximation of an optimal control

Existence of an Optimal Control. The weak sense uniqueness implies


that the jump terms and control can be approximated and that there is
an optimal control. Define V.a(x) = infm W(x, m), where the inf is over the
relaxed admissible controls and the system is {2.12) and {2.13). The weak
convergence in Theorem 2.1 and the fact that the mn(-) can be chosen to
be 1/n-optimal controls implies the following theorem.

Theorem 2.2. Assume one of the cost functions in {11.1.3), the conditions
in the introduction to the chapter and {A2.1). Then there is an optimal
control of the relaxed problem.

Comment on the Proof. Let (xn(-),zn(-),mn(-),wn(·),Nmn(·)) denote


a minimizing sequence and, abusing terminology, let n index a weakly con-
vergent subsequence with limit (x(·), z(·), m(·), w(·), Nm(·)). It follows from
Theorem 11.1.1 that (depending on whether (11.1.3a) or (11.1.3b) is used)
supE lzn(t)l 2 = O(t), supE 1Yn(t)l 2 = O(t). {2.15)
n n
This, the continuity and boundedness of k( ·) on G and the weak conver-
gence imply that
W,a(x, mn) -+ W,a(x, m) = inf W,a(x, m) = V,a(x) (2.16)
m
which is the theorem.

Approximating the Optimal Relaxed Control and Relaxed Pois-


son Measure. Let a set (x(·),z(·),m(·),w(·),Nm(·)) solving (2.12) and
(2.13) be given. Let 8 > 0 and divide U into a finite number of disjoint con-
nected subsets Uf, i ~ k6, with maximal diameters going to zero as 8 -+ 0,
and let a~ be a point in Uf. Given an admissible relaxed control m(·),
define m 6 (-) by m 6 (t,a1) = m(t,Uf), i ~ k6. Thus, m(·) is approximated
by an admissible relaxed control which is concentrated on a finite set. The
associated relaxed Poisson measure Nm6 {·) is obtained by restricting Nm ( ·)
in the obvious way. Let (x 6 (·), z6 (·), m 6 (·), w 6 (·), Nm6(·)) denote an asso-
ciated solution and driving processes for (2.12) and (2.13). The following
theorems are used to extend the approximation results in Subsection 10.1.2
and Section 10.3 to the problem with controlled jumps.

Theorem 2.3. Assume the condition in the introduction to the chapter and
(A2.1). Then the set (x 6 (·), z 6 (·),m 6 (·),w 6 (·),Nm6(·)) converyes weakly to
(x(·),z(·),m(·),w(·),Nm(·)) and W,a(x,m 6 )-+ W,a(x,m).

Theorem 2.4. Assume the conditions in the introduction to the chap-


ter, (A2 .1), and that U has only finitely many points {ai, i ~ k}. Let
13.2 Controlled Jumps 365

(w( ·), N (·), m( ·)) be a Wiener process, Poisson measure and relaxed con-
trol, respectively, with respect to some filtmtion. Define the piecewise con-
stant control uA(·) as follows. For l ~ 1, define TiA(l) = m(l~,ai)­
m(l.6. - ~. ai) and divide each interval [l~, l~ + ~) into subintervals of
lengths rt{l), ... , rf'(l). Then use the control value ai, i :::; k, on the subin-
tervals successively. Let mA(-) denote the relaxed control representation
of uA(·). Let Nma(·) denote the associated relaxed Poisson measure, and
(xA(·), zA(·)) the corresponding solution to (2.12) and (2.13). Then
(xA(·), zA(-), wA(·), mA(-), Nma (·))
converges weakly to (x(·),z(·),w(·),m(·),Nm(·)), solving (2.12) and (2.13).

Infima over Ordinary Controls. Theorems 2.3 and 2.4 imply that the
infimum of the costs over the ordinary admissible controls equals the infi-
mum over the relaxed controls. Thus, the extension of the model via the
introduction of the relaxed Poisson measure does not affect the infimum of
the cost function.

Representation by a Standard Poisson Measure. Let u( ·) be admis-


sible, piecewise constant and take only finitely many values {ai, i :::; k },
as in Theorem 2.4. Then the jump term can be represented in terms of a
standard Poisson measure. Let Ii (t) be the indicator function of the event
that u(t) = ai, which we can take to be a predictable process. Then the
jump term can be represented as

(2.17)

for a standard Poisson measure with jump rate A and jump distribution
IT(·).

13.2.4 Convergence of the numerical algorithm


The numerical problem of computing V,a(x) with the system (2.1) or (2.12),
(2.13), uses the same algorithms as discussed in Section 5.6, with the control
added to q(·). Assume:

A2.2. The approximating Markov chain is locally consistent with (1.1),


and the step sizes for the "dijjusion" steps go to zero as h --t 0.

Recall that the approximating chain is interpolated into a continuous


parameter process 1/Jh(-) which can be represented in the following form:

1/Jh(t) = 1/Jh(O) +I~ b('l/;h(s),uh(s))ds +I~ a('l/;h(s))dwh(s)


(2.18)
+ Jh(t) + zh(t) + ~:h(t),
366 13. Controlled Variance and Jumps

where uh(·) is the control process, th(-) converges weakly to the "zero"
process and zh (·) is the reflection term. The jump term can be represented

Jh(t) = ft f 1
lo lr" u
Qh('I/Jh(s),'"f,uh(s))Nh(dsd'"f), (2.19)

where Nh(·) is a "proto Poisson measure," and Qh(X,'"f,a) -q(x,'"f,a) -t 0


uniformly in (x,'"f,a). Although it is confined to h~ving jumps only at the
end of the interpolation intervals, it has the "effective" jump rate .X and
jump distribution Ilh(·), an approximation which converges weakly to II(·)
as h -t 0. As h -t 0, the pair wh(·), Nh(·) converges weakly to a Wiener
process and Poisson measure.
Define N!,.(·) analogously to the way that Nm(-) was defined above
(2.8). Rewrite (2.18) in terms of the relaxed control representation as

,ph(t) = ,ph(O) + 1t Lb(,Ph(s),a)mZ(da)ds + 1t u('I/Jh(s))dwh(s)

+ Jh(t) + zh(t) + th(t),

1
(2.20)
Jh(t) = ft f Qh('I/Jh(s),'"f,a)N!,.(dsd'"fda). (2.21)
lo lr" u
Henceforth, we assume the conditions in the introduction to the chapter,
(A2.1) and (A2.2). From this point on, the proof that (1/Jh(·),zh(·),mh(·),
wh(·),N!,.(·)) converges weakly to (x(·),z(·), m(·),w(·),Nm(·)) is essen-
tially that of Theorem 2.1. The weak convergence implies that

limhinfVh(x) ~ V(x) (2.22)

as in Theorems 10.5.1 and 11.1.5.


Let f > 0 be arbitrary. The proof that

lim sup v;(x) $ V,a(x) (2.23)


h

in Theorem 10.5.2 used an €-optimal control that was piecewise constant,


took only a finite number of values, and was determined by the conditional
probability rule (10.5.8). In the present case, Theorems 2.3 and 2.4 imply
that attention can be restricted to the case where the control is piecewise
constant and takes only a finite number of values. In this case, (2.12) re-
duces to (2.1), and the rule (10.5.8) can be used to get (2.23) just as in
Theorem 10.5.2.
14
Problems from the Calculus of
Variations: Finite Time Horizon

A large class of deterministic optimal control problems are special cases of


the stochastic optimal control problems considered previously. This is true
both with respect to the construction of schemes as well as the proofs of
convergence. In fact, the convergence proofs become much simpler in the
deterministic setting.
In the present chapter and the next we will consider deterministic op-
timal control problems which are not special cases. We have two goals in
mind. The first is to show the flexibility of the Markov chain approximation
methods with regard to weakening assumptions. The second is to discuss
several classes of problems that are of current interest but not covered by
the results given so far. To contain the development somewhat, we will
focus on the construction of numerical schemes and proofs of convergence
for problems from the calculus of variations. Such problems arise in a wide
variety of settings. Well known examples are classical mechanics and geo-
metric optics (e.g., [28, 62]). A more recent example is the theory of large
deviations of stochastic processes [59]. In many ways these problems are
simpler than most of the problems treated previously. There are, however,
some interesting new features that must be dealt with. For example, when
we rewrite a calculus of variations problem as a control problem, the space
of controls is usually unbounded. In this case, the notion of local consistency
must be extended so that the "errors" in the approximation are properly
bounded as a function of the control. The tightness of controls (in there-
laxed control framework) is no longer automatic, and must be shown to
follow from the formulation of the optimization problem (at least for any
sequence of nearly optimal controls). These issues are considered in detail
368 14. Problems from the Calculus of Variations: Finite Time Horizon

in Section 14.2.
In many problems we must also consider costs that are discontinuous.
If the discontinuity is in the stopping or exit cost, then convergence can
be proved under a mild "controllability" condition. Discontinuities of this
sort are considered in Section 14.2. The problem becomes much more dif-
ficult if the discontinuity is in the running cost. In the case of stochastic
control problems, such discontinuities may essentially be ignored (with re-
spect to the convergence of the numerical schemes) if the optimally con-
trolled process induces a measure on path space under which the total cost
is continuous w.p.1 (see the remark after Theorem 10.1.1). When the un-
derlying processes are deterministic the situation is quite different. In this
case, properly dealing with the discontinuity becomes the main focus of in-
terest. Although the dynamics are deterministic, the natural formulation of
the cost along the discontinuity involves a "randomization" of the costs on
either side, and so one might expect probabilistic methods to be effective.
We will see that the Markov chain method continues to work well and, in
fact, provides a very natural way to deal with a difficult problem for which
there are no alternative methods.
Many of the complicating features we will discuss can also occur in stan-
dard stochastic and deterministic control problems. The interested reader
can combine the methods used in this chapter and the next with those
introduced previously to treat some of these generalizations.
An outline of the chapter is as follows. In Section 14.1 we consider a fixed
time interval and suppose that the cost is the sum of a continuous running
cost and a terminal cost. This is the calculus of variations analogue of the
control problem treated in Chapter 12, and is known as a Bolza problem.
In Section 14.2 we describe the numerical schemes and give the proof of
convergence. Problems where the running cost is discontinuous in the state
variable are considered Section 14.3. Problems with a controlled stopping
time are the topic of Chapter 15.

14.1 Problems with a Continuous Running Cost


Let k : JRk x JRk --+ 1R denote a running cost. We will assume through-
out this chapter and the next that k(·, ·) satisfies the following uniform
superlinear growth condition:

lim inf inf k(x, o:)fc = +oo. (1.1)


c-too x a:lal=c

This condition is natural and holds in most applications. Under (1.1) there
exists a convex function l : [0, oo) --+ (-oo, +oo] which is bounded from
below and satisfies

k(x, o:) ~ l(jo:l) for all (x, o:), and lim l(c)fc--+ oo. (1.2)
c-too
14.1 Problems with a Continuous Running Cost 369

In previous chapters, we have always assumed that the control space is


compact. Thus, tightness of the sequence of relaxed controls appearing in
the numerical schemes was automatic. This will no longer be assumed. The
condition (1.1) will be used in lieu of the compactness to force tightness
of relaxed controls, at least for the sequence of optimal relaxed controls
associated with the approximations to the value function. The function
g( ·) will be a stopping cost, and we will always assume g( ·) is bounded
from below.
Fix T > 0. In the finite time problem, we seek

where the infimum is over all absolutely continuous functions ¢ : [0, T] --+
JRk satisfying ¢(0) = x. For notational convenience we rewrite this prob-
lem as an optimal control problem. The set of admissible controls for this
problem will consist of all measurable functions from [0, T] to JRk. Let u( ·)
be any admissible control. The dynamics of the controlled process are then
given simply by x(t) = u(t),x(O) = x, and the cost to be minimized is

W(x, u) = 1T k(x(s), u(s))ds + g(x(T)).


More generally, we could consider the problem of approximating

V(x, t) = inf [iT k(¢(s), ¢(s))ds + g(¢(T))] ,

where the infimum is over all absolutely continuous functions ¢ : [t, T] --+
JRk satisfying ¢(t) = x. This is the calculus of variations analogue of the
problems treated in Chapter 12. To simplify the notation, we consider only
V(x), but note that the approximation schemes that are derived below
actually yield approximations to V(x, t) for all t E [0, T].
Just as in Chapter 3 it is possible to formally derive a Bellman equation
for the cost V(x, t), at least for the case where k(·, ·)is continuous. For the
problem under consideration, the differential operator takes the particularly
simple form
CY. f(x) = f~(x)o:,

where the control variable o: takes values in JRk. The formal Bellman equa-
tion is then given by
Vt(x, t) + inf [.CaV(x, t) + k(x, o:)] = 0,
{ aER.k
V(x, T) = g(x).
It is sometimes the case that the functions k( ·) and g( ·) satisfy continuity
conditions. However, there are many interesting applications where this is
370 14. Problems from the Calculus of Variations: Finite Time Horizon

not true. For example, it may be the case that the path ¢( ·) is required to
stay in some closed set G over the interval [0, T]. This can be incorporated
into the problem by defining k(x, a) to be +oo for x ¢G. It is also possible
that constraints are placed on ¢0 or on the location of ¢(T). These can
also be incorporated into the problem as given above by suitably redefining
k(·) and g(·). In general, these control and state space constraints can be
readily dealt with (when constructing the numerical method and proving
its convergence). See, for example, the convergence theorems of Sections
14.2 and Chapter 15. A case with a discontinuity that is more difficult to
deal with appears in Section 14.3.
For numerical purposes we may require that the state space be bounded.
One method for bounding that produces an algorithm which is simple to
program is to simply stop ¢( ·) at the first time r that it leaves the interior
of a suitable set, such as

G = {x: ci ~Xi~ di fori= 1, 2, ... , k},

at which time a stopping cost g(¢(r), r) may be assessed. This would add a
Dirichlet boundary condition to the Bellman equation above. For notational
convenience we will combine the terminal and stopping costs as one function
g(x, t), and for later purposes we will assume that g(·, ·) is defined on all of
JRk x [0, T]. The calculus of variations problem then becomes

V(x) ~ in! [f.TM k(~(s), ~(s))ds + g(~(T AT), TAr)l· (1.4)

Alternatively, one can use the analogue of the "reflecting" boundary con-
dition for diffusions that was introduced in Section 1.4.
Fix a particular x E G. For either of these methods of bounding the
state space, a condition that is sufficient to guarantee that the value for
the original problem (1.3) and the value of the modified problem {1.4) are
the same is that the minimizing trajectories for the two problems be the
same and remain in G0 • It is often the case that properties of the functions
k(·, ·)[e.g. the lower bound l(·)] and g(·) can be exploited to obtain a bound
on the range
R = {¢(t): 0 ~ t ~ T},
where ¢( ·) starts at x at time 0 and is a minimizing trajectory for the
problem with no boundary. In such a case, G0 should be chosen to contain
R. By imposing a suitably large stopping cost on the set {(x, t) : x E
8G, t E [0, T)}, it can be assured that the minimizing trajectories for the
original and bounded problems are the same and remain in G0 •
Owing to its practical importance with regard to numerical implemen-
tation, we will use the problem (1.4) as our canonical example of a finite
time problem. In particular, the convergence of numerical schemes will be
demonstrated for this problem. It is worth noting that under appropriate
14.2 Numerical Schemes and Convergence 371

additional conditions, problems with a time dependent running cost can be


handled by the methods developed below.

14.2 Numerical Schemes and Convergence


We will now set up the Markov chain approximation for the numerical
method. The dynamical equation for the optimal control problem of the
previous section is simply x(t) = u(t), where the control process u(·) takes
values in JRk. Suppose we fix a grid Sh c JRk. Since the dynamics are
independent of the state, we will assume ~th(x, a) and the distribution
of y- x under ph(x, yla) are independent of x. State dependency of these
quantities can be put back in with little effort if it is desired. The local
consistency conditions (4.1.3) thus become

E~;::~~~ = a~th(a) + o(lal~th(a)), (2.1)

E~;:: [~~~- E~;::~~~] [~~~- E~;::~~~]' = o(lal~th(a)), (2.2)


where {~~,n < oo} is a controlled Markov chain taking values in Sh. Due
to the unbounded nature of the control space care must be taken regard-
ing how the "errors" in the local consistency equations depend on a. The
precise meaning of (2.1) is that

uniformly for a E JRk, and similarly for (2.2). All chains considered in this
chapter and the next are assumed to be locally consistent in this sense. We
will also assume throughout the rest of this chapter that ~th (a) -t 0 for
each a E JRk and that

lim sup{IY- xi : ph(x, yia) > 0, X E sh, y E sh, a E IRk} = 0.


h-tO
Example 2.1. Suppose that we use the grid

Sh = {x: X= h teini,ni = 0,±1,±2, ... }.


t=l

Then an obvious choice for the transition function is


if y = x ± hei
otherwise

and ~th(a) = h('L,j lajl)- 1 . This definition actually makes sense only when
a -! 0, and we take care of the omitted case by setting
ify =X
otherwise.
372 14. Problems from the Calculus of Variations: Finite Time Horizon

The sequence ~th(o) > 0 is arbitrary [for the purposes of satisfying (2.1)
and (2.2)] and for simplicity we take ~th(O) = h. •
We will require for the finite time problem that ~th(a) be independent
of the control. Following the development of Chapter 12, we next derive
explicit and implicit schemes from transition probabilities satisfying the
local consistency conditions (2.1) and (2.2).

14.2.1 Descriptions of the numerical schemes

Explicit Schemes. Because u( ·) may be unbounded, care must be taken


with respect to the relationship between the time discretization and the spa-
tial discretization. With the definitions [analogous to (12.2.3) and {12.2.1)]

ph• 6(x,xla) = 1- ~t:(a), (2.3)

ph,6(x,yia) = ph(x,yia) {1- ph,6(x,xla))' (2.4)


we find that ph• 6(x, yia) defines a locally consistent chain in the sense of
(2.1) and (2.2) [but with 8 replacing ~th(a) there] if we restrict a to the
set
uh,.S ={a: 8/~th(a) ~ 1}. (2.5)
If we assume ~th(o) ~ 8, then for the case of Example 2.1 we have

uh,<5 ={a: t
J=l
Ia; I~ h/8}.

It may be that bounds on the optimal control for the original calculus
of variations problem are available. For example, it may be that one can
calculate B such that

ess sup
tE(O,T]
L l¢;l(t) ~ B,
j

where ess sup stands for essential supremum and ¢( ·) is a minimizing path.
Then we must assume conditions which guarantee that the restricted con-
trol space Uh,<5 is eventually "big enough." For example, in the case of
Example 2.1 we must assume that the pair (h, 8) are sent to zero in such a
way that
liminf h/8 ~B.
h,6
If such a bound is not available, we must assume
lim inf Uh,.S = IRk (2.6)
h,6 '
14.2 Numerical Schemes and Convergence 373

i.e., h and d must be sent to zero in such a way that given any u E JRk we
o
have u E Uh• 6 for all sufficiently small h > 0 and > 0.
Now let Gh = GnSh and G~ = COnSh. Then the explicit approximation
scheme for solving (1.4) is

v"·' (x, nO) ~ .~P.• [~>··· (x, ylo )V"·' (y, nO+ 0) + k( >•]
X, Q (2. 7)

for x E G~ and no < T, together with the boundary and terminal condition
Vh· 6 (x,no) = g(x,nd) for X¢ G~ and no~ Tor X E ~and no= T.

Implicit Schemes. Following Section 12.4, we define a transition function


on the (x,t)-grid Sh X {0,0,20, ... } by
~h 6 ath(a)
p ' (x,no;x,nd + dla) = ath(a) + d' (2.8)

(2.9)

(2.10)

where ph(x,yia) and ath(a) satisfy the local consistency conditions (2.1)
and (2.2). We also retain from Chapter 12 the notation (~· 6 for the con-
(:•g)
trolled Markov chain and ((~· 6 , to denote separately the "spatial" and
''temporal" components. '
Define Nh· 6 (T) = min{n : (::g~ T}. Then the implicit scheme for
solving (1.4) is

Vh· 6 (x,no) =mink [L.Ph• 6 (x,nd;y,nola)Vh, 6 (y,nd)


aER
y

+ ph• 6 (x,n8;x,n8 + ola)Vh•6 (x,no + 8) + k(x,a)afh• 6 (a)]


(2.11)
for x E G~ and no< T, and with the same boundary condition as for (2.7).
Note that for the implicit scheme we need not require that hand d satisfy
any special relationship as they tend to zero.

14.2.2 Approximations and properties of the value function


In this subsection we state some properties of the controls and the value
function that are needed in the convergence proofs. Let Ba(x) = {y :
lx- Yl <a}. We will use the following assumptions.
A2.1. The set G is compact and satisfies interior and exterior cone con-
ditions: There exist E > 0 and continuous functions v( ·) and w( ·) such that
given any x E aG, Uo<a<fBw(x+av(x)) C G and Uo<a<fBfa(x+aw(x))n
G=0.
374 14. Problems from the Calculus of Variations: Finite Time Horizon

A2.2. The function k( ·, ·) is continuous and satisfies the superlinear growth


condition (1.1). There exist M < oo and f: lR -t [0, oo) satisfying f(t) -t 0
as t -t 0, such that for all x andy,
sup [lk(x, a) - k(y, a)l- f(lx- yl) (M + k(x, a))] ~ 0.
Q

A2.3. The function g(·, ·) is uniformly continuous and bounded when re-
stricted to either of the sets (JRk- G0 ) X [0, TJ and G0 X {T}.
Remarks. The condition (A2.2) occurs often in calculus of variations prob-
lems of the type we consider. See, for example, [59, Chapter 5]. The con-
tinuity assumption in (A2.2) is intermediate between continuity in x that
is uniform in a (which is much too restrictive) and simple continuity. Note
that (A2.3) does not assume continuity of g(·, ·). Continuity would not
be reasonable, since g models both the stopping cost and the terminal
cost. We have chosen the given form for g(·, ·) because it is common in
applications. Owing to the "controllability" of the dynamics ¢(t) = u(t),
discontinuities in g(·, ·) are generally not too difficult to deal with. In par-
ticular, it follows from (A2.1) and (A2.2) that the infimum in (1.4) is the
same if g(·,·) is replaced by g*(·,·), where g*(x,t) = g(x,t) fort< T and
g*(x, T) =limE-to inf{g(y, T) : ly-xl ~ t:}. We assume without loss of gen-
erality (be redefining k off G if need be), that k(x, 0) is uniformly bounded
for x E JRk.
It is sometimes desirable to weaken (A2.1). For a simple example, con-
sider the case of control until a target set is reached when the target set is
a single point. For obvious reasons, such target sets are not typically con-
sidered in stochastic control problems. However, they do appear often in
deterministic control. Clearly, the interior cone condition is satisfied at the
point, but the exterior cone condition is not. In the example of Subsection
15.3.3 we show how to extend the convergence proofs to cover such cases.

Definition. The set of admissible relaxed controls for this class of deter-
ministic problems consists of all measures m( ·) on the Borel subsets of
JRk x [0, oo) which satisfy m(JRk x [0, t]) = t for all t E [0, oo ).

For notational convenience we continue the convention of assuming con-


trols are defined for all time, even if they are only applied over a finite
interval.
Theorem 2.2. Assume the condition (A2.2). Suppose that a relaxed control
m(dads) = m 8 (da)ds is given such that for

x(t) =X+ rt r ams(da)ds,


lo JJRk
we have
{T { k(x(s),a)m 8 (da)ds < 00.
lo JJRk
14.2 Numerical Schemes and Convergence 375

Let f > 0 be given. Then there exists i5 > 0 and a finite set {a}, ... , a~.} =
U, c JRk with the following properties. There is a function u' : [0, T] --+ U,
which is constant on intervals of the form [ii5, ii5 + 15), and such that if

x'(t) = x + 1t u'(s)ds,

then
sup lx'(t) - x(t)l :Sf
O$;t$;T
and

sup rt r k(x(s), a)ms(da)ds _ t k(x'(s), u'(s))dsl :sf.


o::;;t::;;r Jo JJR• Jo
1

Proof. The theorem is essentially a simpler version of Theorem 10.1.2. The


main new difficulty in the current setup is the unboundedness of the control
space. We will prove that given f > 0 there is an admissible relaxed control
m'(·) and a compact set K' such that m'(·) is supported on K' x [0, T],
and such that if x'(·) is the associated solution, then

sup 1 t r k(x(s),a)ms(da)ds _ rt r k(x'(s),a)m:(da)dsl :sf


ostsr Jo JIR• Jo JJR•
and
sup lx'(t) - x(t)l :S t.
O$;t$;T
Given the existence of such a control, the proof of existence of a control
u'(-) satisfying all the conditions of the theorem follows from Theorem
10.1.2.
The lower bound (1.2} implies k(x, a) is nonnegative whenever lal is
sufficiently large. By the dominated convergence theorem and (1.2}, there
exists c( f) < oo such that

{T
Jo
r
}JR.
I{lal~c(t)}lk(x(s),a)lm(dads):::; f/2 (2.12)

and
{T { l{lal~c(t)}lalm(dads) :St.
Jo JJR•
This last inequality implies

sup I rt
o::;;t::;;r Jo JrJR• I{laJ~c(t)}am(dads)l:::; f.
(2.13)

We can also assume c( f) is sufficiently large so that

( sup lk(x,O)I) (
xEIR•
fr f
Jo JJRk
I{laJ~c(•)}m(dads)) ::; f/2. (2.14)
376 14. Problems from the Calculus of Variations: Finite Time Horizon

The desired m€(·) can then be obtained by defining


m~(B) = ms(B n {a: lal < c(~:)}) + ms({a: lal2:: c(~:)})IB(O)
for all Borel sets Bands 2:: 0. For this choice of m€0 we have
sup lx€(t)- x(t)l :$; E
O~t~T

because of (2.13). Therefore, by (A2.2), (2.12), and (2.14),

sup I ft f
o<t<T lo }Bk
k(x€(s),a)m~(da)ds- ft f k(x(s),a)m 8 (da)dsl
lo }Bk
-- ,; <+ /(<) ( MT + { /.,, k(x(s), a)m,(da)ds) .


As was observed in Chapters 9 and 10, for problems with boundary con-
ditions one must pay particular attention to the manner in which controlled
trajectories reach the boundary. A previously discussed method for proving
the convergence of costs involved showing the continuity of the first exit
time of ¢ when ¢ was a sample path of an underlying process (at least
w.p.1). See the discussion in Section 10.2. Here, the deterministic nature of
the problem makes this approach not applicable. Instead of requiring that
the exit time be continuous for all paths x that solve x(t) = x+ I~ u(s)ds for
arbitrary admissible controls (which is impossible), we shall require only the
existence of an «:-optimal control such that t"he exit time is continuous at
the associated solution. This condition will always hold under our assump-
tions [except when we weaken (A2.1) in Subsection 15.3.3] and turns out
to be sufficient for the proof of the upper bound limsuph-tO Vh(x) :$; V(x).
To handle the lower bound liminfh-tO Vh(x) 2:: V(x), we will need the
following result.
Theorem 2.3. Assume the conditions (A2.1) and (A2.2). Let m(·) be any
admissible relaxed control, x E G0 , and let x(t) = x +I~ IBk am(dads).
Fix any (J' < oo, and assume that x(t) E G fortE [0, (]'], and that

r }Bk{ k(x(s),a)m(dads) < oo.


Jo
(2.15)

Then given f > 0 there exists a control m€0 with associated solution xE(·)
such that
x€(t) E G0 , fort E [0, (]'), xE(O) = x(O), xE((J') = x((J'),
and
14.2 Numerical Schemes and Convergence 377

Proof. The basic idea of the proof is to show that if the path x( ·) is on oG
at any timet E (O,u), then the interior cone condition and the continuity
properties of the running cost imply that by "pushing" the path by a small
amount in the v(x(t)) direction, we obtain a path that is interior toG at
timet and that has nearly the same running cost as x(·).
We now proceed to the proof. Assume the theorem is false. Then there
exists a control m( ·) with associated solution x( ·) for which the conclusion
is not true. Define A to be the set of all t E [0, u) such that given any
f > 0 there exists a control m*O with associated solution x*(-) with the
properties that x*(s) E G 0 for all s E [0, t), x*(O) = x(O), x*(s) = x(s) and
m;(-) = msO for s E [t, u], and

11t Lk k(x(s), o:)m(do:ds) -lot Lk k(x*(s), o:)m*(do:ds)l <f.

Define u* to be the supremum of the set A. Our assumption concerning m( ·)


implies u* < u, which we now show is false. For simplicity we will assume
that u* is not only the supremum of A but also that u* E A. The proof
when u* f/. A is essentially the same but notationally more complicated.
By (A2.1) and the fact that x(t) E G fortE [u*, u], there is 11 E (0, u-u*)
such that
x(t) + -yv(x(u*)) E G 0 (2.16)
for all t E [u*, u* + v] and 'Y E (0, v].
Let f > 0 be given, and let m*O be an admissible control such that
x*(t) E CO fortE [O,u*), x*(O) = x(O), x*(t) = x(t) and mt(-) = mt(·) for
t E [u*,u], and such that

Ilou* Lk k(x(s),o:)m(dads)- lou* Lk k(x*(s),o:)m*(do:ds)l < f/2.


The existence of m * ( ·) is guaranteed by the definition of u*. There is v* >0
[which can depend on x*(-)] such that

x*(t) + -yv(x(u*)) E G0 (2.17)

for all t E [u* - v*, u*) and 'Y E (0, v*]. For B c JRk and x E JRk, let
B + x = {y + x : y E B}. For each c > 0 we define an admissible relaxed
control me(-) by

m;(B) for s E [O,u* -v*),


mc(B) _ { m;(B- cv(x(r*))jv*) for s E [u*- v*,u*),
8 - m 8 (B + cv(x(r*))jv) for s E [u*,u* +v),
ms(B) for s E [u* + v,u].

Let xc(-) be the associated solution. The effect of this control is to force
xc(t) E G 0 fortE [u* -v*,u* + v), for all sufficiently small c > 0. This
378 14. Problems from the Calculus of Variations: Finite Time Horizon

follows from (2.16) and (2.17). We also have xc(t) = x(t) and m~(-) = mt(-)
fort E [u* + v, u], and xc(t) = x*(t) fortE [0, u*- v*). Under (A2.2), we
have

f
lo
u*+v
J.
fik
k(xc(s), a)mc(dads)- f
lo
u*+v
f.
fik
k(x*(s), a)m*(dads) --+ 0

as c --+ 0. Because f > 0 is arbitrary, we find that u* + v E A, and thereby


obtain a contradiction. •

14.2.3 Convergence theorems


We first prove a general convergence result for chains that satisfy the local
consistency conditions (2.1) and (2.2) and also have uniformly bounded
running costs. The theorem is stated for chains that have a single dis-
cretization parameter h. However, the statement and proof are essentially
the same for the two parameter families considered in Subsection 14.2.1
with (h, 6) --+ 0 replacing h--+ 0. The result will be used in many places in
this chapter and the one that follows, and, consequently, is formulated in a
general way. We first set up the notation of the theorem. Let {€~, i < oo}
be a controlled Markov chain. For each h > 0, assume that the transition
probabilities pt(x, yla) and interpolation interval Ath(a) satisfy (2.1) and
(2.2). Let {u~, i < oo} be the sequence of controls applied to the chain and
define
n-1
t~ = L Ath(u~)
i=O
and the interpolated processes

As always, we assume the controls are admissible in the sense of Section


2.3. For S < oo, let Sh = inf{t~: t~;?: S}. Define the relaxed control mh(·)
by
mZ(A) = IA(uh(s)), mh(A x [0, t]) =lot mZ(A)ds.
In the theorem statement, c(h)--+ 0 is any sequence for which c(h)laiAth(a)
is a bound for the absolute value of the o(laiAth(a)) terms appearing in
(2.1) and (2.2). Such a sequence always exists under our definition of a
locally consistent chain.
Theorem 2.4. Assume that (1.1) holds, and that
14.2 Numerical Schemes and Convergence 379

for each S < oo. Then the set {mh(-), h > 0} is tight. Assume also that
es'
the collection of initial conditions { h > 0} is tight, and suppose that a
subsequence (again indexed by h) is given such that {m h ( ·), h > 0} con-
verges weakly to a limit m(.) and e8 converges weakly to X. Then m(.) is
an admissible relaxed control and {eh(·), h > 0} converges weakly to a limit
x(·) that satisfies

x(t)- x = ft
lo
1 o:m(do:ds)
JRk

w.p.l.
Suppose for each h > 0 that N h is a stopping time for the chain { ef, i <
oo }, and let Th = t'Jvh. If

lim c(h)Em
h--+0
h1Th1
0 JRk
!o:!mh(do:ds) = 0,

then the convergence is uniform in the sense that

in probability.

Proof. Fix S < oo. By using the lower bound (1.2) and the assumptions
of the theorem, we obtain

limsupEmh
h--+0
[1 Jlk X (0,8)
l(!o:l)mh(do:ds)] < oo, {2.18)

and, by the superlinearity of l(·),

limsuplimsupEmh
b--+oo h--+0
[1Jlk x (0,8)
I{lal~b}!o:!mh{do:ds)] = 0. {2.19)

Because the control space is JRk, an arbitrary collection of relaxed controls


need not have compact closure. However, we have the estimate {2.18) on
the "tails" of these measures. A straightforward application of Prohorov's
theorem (Theorem 9.1.2) and Chebyshev's inequality can then be used to
show the sequence {mh(-), h > 0} is tight in 'R(JRk x [0, oo)), and that any
limit measure m satisfies m(JRk x [0, S]) = S for all S < oo w.p.l.
Now suppose that a subsequence again indexed by his given such that
{mh(·),h > 0} converges to a limit m(·). Define the process {ef,i < oo}
by ea = 0 and
380 14. Problems from the Calculus of Variations: Finite Time Horizon

and the interpolation eh (t) = e~ for t E [t~, t~+l). The process eh ( ·) keeps
track of the "error" terms in (2.1). Owing to the definition of this process,

E (~f+l- ~f- uf~th(uf)- (ef+l- ef) luf) = 0.

Therefore, if we define

then the process {'Yh(t~), i < oo} is a martingale. A calculation using (2.2)
gives
Emh l'Yh(Sh)l2 ~ c(h)Emh {sh { lo:lmh(do:ds),
lo JRk
where c(h).....; 0 ash.....; 0. Thus, under the assumptions of the theorem,

Em' [e•(s.)- e•(o) -1 L. 8


' am•(dads)- e•(s•)r-+ 0
and, by the martingale inequality (1.1.2),

sup
O~t~S
~~h(t)- ~h(O)-
lo
t j.H'lk
{ o:mh(do:ds)- eh(t)J.....; 0.
Using the Skorokhod representation we can assume that the convergence
mh(-).....; m(·) is w.p.1 in the topology of weak convergence of measures on
JRk x [0, oo). By (2.19) and the fact that m(JRk x {t}) = 0 for all t E [0, S],
we obtain
rt }Rkr
lo
o:mh(do:ds).....;
lo
rt }Rkr
o:m(do:ds)

w.p.l. It follows from (2.1) that SUPo<t<S leh(t)l .....; 0. Combining these
facts with the representation (2.20), we have that (eh(·), mh( ·)) converges
in probability to (x(·),m(·)), where

x(t) =X+
lo
t r o:m (do:)ds
JRk
8

and m 8 (do:)ds = m(do:ds).


The last sentence of the theorem follows from the same calculations as
those given above, save that 7h 1\ s replaces sh .•

We are now ready to prove the convergence theorems. It will often be


convenient to have the chains er'u~ defined for all i < oo, and not only up
until the chain exits G0 or is stopped by choice. Unless otherwise stated,
it is our convention that the control a: = 0 is used after our interest in the
chain stops.
14.2 Numerical Schemes and Convergence 381

Theorem 2.5. Assume (A2.1), (A2.2), (A2.3), and that (2.6) holds. Then
for the explicit scheme defined by (2. 7) we have
Vh· 6 (x) ---* V(x).
e:·
Proof. Let { 6 , i < oo} be a controlled locally consistent Markov chain
as in Subsection 14.2.1 with the transition probabilities _ph· 6(x,y!a), inter-
polation interval a, and initial condition x E CO. Let {u~·'\ i < oo} denote
a sequence of controls that are applied to the chain. We define the inter-
polated process and control by setting
eh•6 (t) = e~· 6 , uh• 6 (t) = u~· 6 , t E [ia,ia +a).
Define 'Th,.s to be the first time eh,.s (·) leaves ag. As usual, we use relaxed
rather than ordinary controls to prove the convergence of the scheme. We
define mh•6 (·) by setting

mZ• 6 (A) = IA(uh·'\s)), mh•6 (A X [O,t]) =lot mZ• (A)ds


6

for all Borel sets A in JRk.


In the proofs of the upper and lower bounds we will have to deal with
a number of special cases, due to the possible discontinuity of g(·, ·). The
various cases are all determined by whether or not the process exits G, and
if so, by the time at which it exits. Regrettably, this lengthens the proof
somewhat.

Proof of the Lower Bound. For any e > 0, let {mh• 6 (-),h > O,a > 0}
be the relaxed control representation of an e-optimal sequence of ordinary
admissible controls. In proving the lower bound, we can assume without
loss that the running costs associated to this sequence are bounded from
above, and therefore that Theorem 2.4 applies. Let (h, a) denote any sub-
sequence, and retain (h, a) to denote a further subsequence along which
(eh• 6 (-), mh·6 (-), Th,.s) converges weakly. We have the inequality
h 6 1TATh,61
Vh• 6 (x) ~ E';' ' k(eh· 6 (s), a)mh•6 (dads)
0 JRk

Assume we are using the Skorokhod representation, so that the convergence


(eh· 6 (-),mh· 6 (-),rh,.s)---* (x(·),m(·),f) is w.p.l. Let r = inf{t: x(t) E 8G},
and fix an w E n such that the convergence holds. It is automatic that

[1 1
T ::::; f. We claim that

TATh,6
liminf k(eh• 6 (s),a)mh• 6 (dads)
(h,.S)-+0 0 JRk
(2.21)
+ 9 (eh· 6 (T" Th,.s), T" Th,.s)] ~ V(x).
382 14. Problems from the Calculus of Variations: Finite Time Horizon

The case f < T, r = f. In this case (2.21) follows from the conver-
gence (eh· 6(-),mh• 6(-),rh,6) -t (x(·),m(·),f), Fatou's lemma, and (A2.2)
and (A2.3).

The case f < T, r < f. For this case we still have the left hand side of
(2.21) bounded below by

(i' f k(x(s),a)m(dads) + g(x(f),f).


lo JJR.k
Since r < f, this cost is not the cost associated with (x(·),m(·)) in the
definition of V(x). However, by Theorem 2.3 there exists a relaxed control
m(·) with associated solution x(·) that starts at x, remains in C 0 fort E
(O,f), satisfies x(f) = x(f), and has running cost arbitrarily close to that
of (m(·),x(·)). Thus, (2.21) holds in this case as well.

The cases f > T, r = f and f > T, r < f are handled in the same way
as f < T, r = f and f < T, r < f, respectively. Finally, we must deal
with f = T. The only difference between this case and the previous cases is
that g(·, ·)may be discontinuous at (x(T), T). Consider first the case when
r = f, i.e., x(t) E C0 for t E [0, T). In this case, (2.21) follows from the
fact that

fT f k(x(s),a)m 8 (da)ds + g.(x(T), T) ~ V(x),


Jo JJR.•
where g.(x, T) = lime-to inf{g(y, T) : IY- xi ~ E}. To prove the last in-
equality, it is enough to note that if lime-to inf{g(y, T) : lx(T)- Yi ~ E, y E
C 0} < g(x(T), T), then by (A2.1) and (A2.2) an admissible relaxed control
m( ·) can be found which has running cost that is arbitrarily close to that of
m( ·) and for which the associated solution x( ·) stays in CO and terminates
at a point x(T) which can be made arbitrarily close to x(T). The case r < f
is similar, save that Theorem 2.3 must be applied to deal with any parts
of x(·) on oC fortE [0, T). Thus, (2.21) holds w.p.l.

By Fatou's lemma liminf(h,6)--to Vh• 6(x) ~ V(x)- E along the conver-


gent subsequence. To prove liminf(h,6)-+0 Vh• 6(x) ~ V(x) for the original
sequence, we argue by contradiction, and then send E -t 0.

Proof of the Upper Bound. FixE > 0 and let m(·) be an €-optimal
admissible relaxed control for V(x). Let x(·) denote the associated solution.
We first show we can assume [by modifying m( ·) if need be] that x( ·) either
exits C 0 in a "regular" way or else remains in C 0 •

Case 1. Suppose x(t) E C 0 for t E [0, T]. In this case no modification is


needed, and we leave m(·) unchanged.
14.2 Numerical Schemes and Convergence 383

Case 2. Suppose that T = inf{t : x(t) E 8G} < T. Assumptions {A2.1) and
{A2.2) imply that we may redefine m 8 {-) for sin some interval of the form
s E (r,v],v >Tin such a way that x(t) r/. G fortE {r,v) and

11r
7'
t
JRk
k(x(s), o:)ms(do:)ds ~0
as t ~ T.

Case 3. Suppose that x(t) E CO fortE [0, T) and x(T) E oG. Then there
are two possibilities. If limHo inf{g(y, T) : ix(T) - Yi ~ €, y E G0 } ~
g(x(T), T), then by the same argument as in Theorem 2.3 there exists a
control mE(·) which is 2€-optimal and for which the associated solution
remains in G0 for all t E [0, T]. Otherwise, we can use the exterior cone
condition of (A2.1) and (A2.2) and find a control mE(·) which is 2E-optimal
and for which the associated solution exits CO before timeT. Hence, case
3 reduces to either case 1 or case 2.

The main point of all this is that we can always find an E-optimal control
that avoids the problematic case where x(·) exits G0 at time T. More
precisely, we may assume that given any E > 0 there exists an E-optimal
control m(·) with associated solution x(-) such that either

x(·) E G0 for all t E [0, T], (2.22)

or else there is T E [0, T) and v > 0 such that

x(t) E G0 fort E [0, r), x(t) rf. G fortE {r, T + v],


1r
7'
t
JJRk
k(x(s),o:)m 8 (da)ds ~0 as t ~ T.
(2.23)

The case T > T. Let any E1 > 0 be given. By Theorem 2.2 there is a finite
set UE 1 C JRk, 6 > 0, and an ordinary control uE 1( ·) with the following
properties. uE 1 ( ·) takes values in UE 1 , is constant on intervals of the form
[j6, j6 + 6), and if xE 1 ( ·) is the associated solution, then
sup lxE 1 (t)- x(t)i ~ E1
O:S;t~T

and

sup
O:S;t~T
Itlo r
JJRk
k(x(s),o:)ms(do:)ds- t k(xE (s),uE (s))dsl ~fl.
lo
1 1

Under (2.6) we have Uq C Uh,l; whenever h > 0 and 8 > 0 are sufficiently
e;·
small. We apply the control uE 1( ·) to the chain { 6 , i < co} in the obvious
way, namely,
384 14. Problems from the Calculus of Variations: Finite Time Horizon

Define t;.h,l5 ( ·) to be the piecewise constant interpolation of {f.~· 15 , i < oo}


with interpolation interval <5. Then, by Theorem 2.4, f.h• 15 0 -t x' 1 0 w.p.l.
Suppose we first let (h, 8) -t 0 and then send f 1 -t 0. By applying the
dominated convergence theorem and using (A2.2) and (A2.3), we have

lim sup Vh• 15 (x)::; V(x) +f.


(h,/))~0

The case r < T. Next consider the case of (2.23). As in the case of (2.22) we
can assume the existence of piecewise constant ordinary control with the
given properties save that T is replaced by v. Because SUPo<t<v lx<t (t) -
x(t)i can be made arbitrarily small, (2.23) implies we can also assume that

We now apply u' 1 ( ·) as in the previous case. By sending (h, 8) -t 0 and thEm
fi -t 0, we obtain limsup(h,l5)~o Vh· 15 (x) ::; V(x) + f from (A2.2), (A2.3),
(2.23), and (2.24). Since f > 0 is arbitrary, the upper bound is proved. •

A very similar proof gives the following result.

Theorem 2.6. Assume (A2.1), (A2.2), and (A2.3). Then for the implicit
scheme defined by (2.11) we have

14.3 Problems with a Discontinuous Running Cost


Consider once again the finite time problem first introduced in (1.4) in
Section 14.1:

V(x) ~ inf [[M k( ¢(s), ¢( s) )ds + g( ¢(TAT), TAT)]· (3.1)

Recall that r is the first time that ¢( ·) exits the interior of a given set G
and that T > 0 is fixed.
In many calculus of variations problems, there is some underlying dynam-
ical model for a "physical" system which determines the function k( ·, ·), and
minimizing (or nearly minimizing) paths ¢(-) have important physical in-
terpretations. For example, in the case of classical Hamiltonian mechanics,
the relevant laws of motion define k(·, ·), and the path ¢0 describes the
trajectory followed by a particle subject to those laws. In the case of ge-
ometric optics, k(·, ·) is defined in terms of the local speed of light. Thus,
in the typical formulation of a calculus of variations problem appropriate
14.3 Problems with a Discontinuous Running Cost 385

to some given applied problem, the function k(·, ·) is obtained during the
modelling stage.
Clearly, the function k(·, ·) reflects properties of the "medium" in which
the underlying dynamical system evolves. Typically, the definition is local
in the sense that k(x, ·)reflects the properties of the medium at x. If k(x, a)
possesses some kind of continuity in x, then it seems reasonable to believe
this reflects a type of spatial continuity in the properties of the medium;
e.g., the speed of propagation varies continuously in x in the geometri-
cal optics problem. However, in many problems of interest such continuity
properties may be violated. For example, it may be the case that the space
is divided into two (or more) disjoint regions R< 1> and R< 2>, with a smooth
interface separating the regions. In each region the physical properties of
the media vary continuously, but they differ from one region to the other.
It is simple to produce examples of this type from classical mechanics or
geometrical optics. More recent examples come from large deviation the-
ory [40] (and, in particular, the application of large deviation theory to
queueing systems [44]).
In such a case, one must rethink the modelling of the original physical
problem as a calculus of variations problem. Clearly, the modelling ap-
propriate for a single region of continuous behavior should be appropriate
for defining or identifying k(x, ·) when x is in the relative interior of ei-
ther of the regions R< 1> or R< 2>. However, there is still the question of the
proper definition of k(x, ·) for points x on the interface. This is quite im-
portant because in many cases the optimal paths will spend some time on
the interface. The mathematical problem (3.1) will be well posed under
just appropriate measurability assumptions on k(·, ·). But from the point
of view of modelling, certain additional properties can be expected (or per-
haps should even be demanded). For example, regardless of how k(x, ·) is
defined on the interface it should lead to a cost function V(x) which has
desirable stability properties under approximations, discretizations, small
perturbations, etc. This turns out to impose restrictions on the form of
k(x, ·).
In this section we will present what appears to be a "natural" definition
of the integrand on the interface, and describe an associated numerical
procedure. By natural, what is meant is that the definition occurs in ap-
plications, leads to a value that is stable under discretizations, and can be
shown to be the only definition on the boundary that is stable under a wide
class of approximations. We will also show that this particular definition of
the cost on the interface, in spite of its complicated appearance, allows the
use of relatively simple numerical approximation procedures. To simplify
the notation we will assume that the interface is "flat," in which case we
can take the interface to be {x E JRk : x1 = 0}, R(l) = {x : X1 ~ 0}, and
R< 2> = {x: x 1 > 0}, where the subscript denotes the first component of x.
Generalizing the results contained in this section to cover a smooth curved
boundary is not difficult. The case of several intersecting boundaries, which
386 14. Problems from the Calculus of Variations: Finite Time Horizon

we do not consider, is more subtle and not yet fully understood.

Remark on the Notation. We will need to use subscripts and super-


scripts in several different ways, such as to denote discrete time indices,
regions [i.e., R( 1) or R< 2 l)], components of a vector, and so on. To simplify,
the following conventions will be used. We will denote the first component
of a vector a by (a) 1· Thus, the symbol a1 will not be used for the first
component of a. Furthermore, if a quantity is intrinsically associated to
the region R(i), then it will bear a superscript of the form (i) [e.g., a(ll and
a(2)J.

14.3.1 Definition and interpretation of the cost on the


interface
For each i = 1, 2, let k(i) : JRk x JRk -+ IR satisfy (1.1) and (A2.2). For
simplicity we will assume that k(il(x, ·) is convex for each x and i = 1, 2.
The convexity assumption can be dropped at the expense of complicating
the formula for k( 0 )(·, ·) given in the next two paragraphs.
We define

k(O) (x, a) = inf {p< 1l k( 1) (x, a<l)) + p< 2 ) k( 2 ) (x, a< 2l)} , (3.2)

where the infimum is over (p< 1l, p< 2l) E JR 2 and (a(l), a< 2l) E JR 2k satisfying

(3.3)

(a(l)h 2: 0, (a< 2lh :::; 0, (3.4)


p(l) a(l) + p(2) a(2) = a. (3.5)
We then define
k(ll(x,a) if (xh <0
k(x,a) = { k< 0 l(x,a) if (xh = 0 (3.6)
k< 2l(x,a) if (xh > 0.

Clearly, the function k(x, a) may be discontinuous in x, and, therefore,


(A2.2) may not be satisfied. It follows directly from the definition that
k( 0 l(x,a):::; k(ll(x,a) for a satisfying (a)t 2:0 and k< 0 l(x,a):::; k( 2l(x,a)
for a satisfying (a h :::; 0. It also follows that k(o) ( ·, ·) satisfies the superlin-
earity condition (1.1). Note that only those values of k( 0 l(x,a) for x and a
satisfying (xh = 0 and (a)t = 0 affect the value of V(x).

Remark. An analogous definition can be given for more general classes


of deterministic optimal control problems under a mild "controllability"
condition. The appropriate definition and controllability condition can be
found by applying the heuristic argument given below.
14.3 Problems with a Discontinuous Running Cost 387

Interpretation of the Cost. Perhaps the simplest way to motivate the


form of the cost k(o) (·, ·) on the interface is to imagine the continuous
time problem (3.1) as arising as the limit of a sequence of discrete time
problems. To simplify the notation, we will take G = JRk, and use g(x)
instead of g(x, t). Thus, let~> 0 and consider discrete time dynamics of
the form
¢e_l = ¢~ + ~ui, ¢~ = x,
and suppose that the cost to be minimized over all {ui,i = 0, ... , (T/~)-1}
is
(T/tl.)-1
L ~ [I{(<t>fh:<:;o}k(l)(¢~,ui) + J{(<t>fh>O}k(2)(¢~,ui)] + g (¢~;a)).
i=O
For specificity, we have assigned the cost k( 1)(·, ·)to points on the interface
for the discrete time problem, but this does not actually matter in the limit
~~0.
One can then ask if there is a continuous time calculus of variations
problem for which the minimal cost is the limit of the minimal costs for
these discrete time problems, and if so, what is the running cost. It is not
hard to show that there is a limiting continuous time problem and that it is
given by (3.1) with the running cost defined by (3.2) to (3.6). It is obvious
that the correct running cost away from the interface is either k( 1)(·, ·) or
k( 2 )(·, ·). That k(0 )(., ·) gives the proper definition on the interface can be
heuristically argued as follows. Consider a section i = i1, i 1 + 1, ... , i2 of the
discrete trajectory {¢~, i = 0, ... , T / ~ - 1} when ¢~ is close to the inter-
face, and suppose that the average (in time) "velocity" (i.e., the average of
the corresponding ui) fori E {it, ... , i 2} is a. Because (¢f"h ~ 0 at both the
beginning and end of the section, we have (ah ~ 0. Given that the process
will remain in a small neighborhood of the boundary for a given period of
time, it has the option of "selecting" between the two costs k(l)(-, ·) and
k( 2 )(., ·),since the process can move quickly and "cheaply" from one side
of the interface to the other. Thus, for a certain fraction of time, the run-
ning cost k( 1)(·, ·)will be used and, for the remaining fraction of time, the
cost k( 2 ) ( ·, ·) will apply. In order to learn the exact form of the cost on the
interface, we must determine the optimal way in which the discrete time
process can exploit the two available costs k( 1)(., ·)and k( 2 )(·, ·)while simul-
taneously maintaining an average velocity a. The definition of k{O) (¢~,a)
suggests the answer. Let p< 1), p< 2 ), a< 1), and a< 2) minimize (3.2) subject
to (3.3)-(3.6). Then a nearly optimal way for the discrete time process to
behave is to use ui = a< 1) when the state ¢f" is in the set {x: (xh ~ 0}
and ui = a< 2) when the state is in the set {x : (xh > 0}. The p(i) turn
out to represent respective fractions of time spent in the two sets, and the
interpretations of (3.3) and (3.5) are clear. The constraint (3.4) is essen-
tially a feasibility condition. If this condition is violated, then the discrete
388 14. Problems from the Calculus of Variations: Finite Time Horizon

time process will actually move away from the interface, contradicting our
assumption that it remain near the interlace fori E {i 11 i 1 + 1, ... , i 2 }.

14.3.2 Numerical schemes and the proof of convergence


We next consider the construction of numerical schemes for the problem
(3.1). It turns out that we do not need to have access to or to compute
k(·, ·) for the points on the interface {x: (xh = 0}. Instead, a convergent
scheme can be constructed using only k< 1>(-, ·) and k( 2l(., ·). This is due
to the form that the running cost takes on the interface and is another
manifestation of the robustness of the definition of the cost given in (3.2)-
(3.6). The numerical scheme to be defined below, which uses only the costs
k(ll(·, ·) and k< 2>(·, ·),will nonetheless converge to the value function with
the cost k< 0>(·, ·)on the interface. This fact is of some significance because
it means that k(o) ( ·, ·) never actually needs to be computed when computing
the approximations to V(x). This is especially useful for problems where
the functions k< 1>(·, ·)and k< 2 >(·, ·)take a relatively simple form in a:, such
as quadratic or exponential. For these cases, the minimizations that must
be done in order to compute an approximation to V(x) often can be done
analytically rather than numerically. This would not be the case if k< 0>(·, ·)
appeared in the definition of the schemes.
We can use either the implicit or explicit schemes described in Subsection
14.2.1. For concreteness, we will consider the explicit method given by (2.7).
However, as remarked above, we will not use the running cost k(·, ·)in the
formulation of the scheme. If x E G~ and (xh < 0, then we will use the
running cost k< 1 >(., ·).If x E G~ and (xh > 0, we will use k< 2 >(·, ·).For any
points in~ that satisfy (xh = 0, we can use either k(ll(·, ·)or k< 2 >(·, ·) (it
will not actually matter in the limit as the discretization parameters tend
to zero), and for specificity we will use k(ll(·, ·) for these points. Thus, if
we define
- { k(l) (x, a:) if (xh ::; 0
k(x,a) = k(2l(x,a:) if {x)l > 0, {3.7)

then the numerical scheme becomes

v•·•(x, nO) ~ .~~.• [ ~ fi"·' (x, Yl<>)V•·•(y, nO+ 0) + k(x, a)0] (3.8)

for x E ~and n8 < T, together with the boundary and terminal condition
yh,.S(x, no)= g(x, no) for X tJ. ag
and no< Tor X E ag
and no= T. Recall
the Uh,.S was defined in {2.5).
Because our main interest in this section is in examining the new features
that are associated with the discontinuous running cost, we will replace
{A2.1) and {A2.3) by the following somewhat stronger conditions.
A3.1. The set G is compact and satisfies interior and exterior cone con-
ditions: There exist e > 0 and continuous functions v(·) and w(·) such that
14.3 Problems with a Discontinuous Running Cost 389

given any X E aG, Uo<a<EBEa(x+av(x)) c G and Uo<a<EBEa(x+aw(x))n


G = 0. The set G also satisfies the additional condition that (v(x)h = 0 if
x E oG and (xh = 0.
A3.2. The function g(·,·) is continuous and bounded on JRk x [O,T] and
the canonical transition probabilities of Example 2.1 are used to define ph,{J.
Condition (A3.1) essentially states that there are no points in 8G at
which aG is ''tangent" to the interface {x : (xh = 0}. Note that the
largest jump in the direction perpendicular to the interface that can occur
under {A3.2) is of size h.

Theorem 3.1. Assume {A3.1), {A3.2), and that both k< 1>(·, ·) and k< 2>(·, ·)
satisfy {1.1) and {A2.2). Let k(·, ·) be defined by {3.2)-{3.6), assume that
{2.6) holds, and let V(x) be defined by (3.1). Then for the explicit scheme
defined by {3.8), we have

Vh• 6 (x) ~ V(x).

Proof. We begin by recalling the notation used in the proof of Theorem 2.5.
e?·
Thus, { 6 , i < oo} is a controlled Markov chain as described in Subsection
14.2.1 with transition probabilities f}• 6 (x, yJo:), interpolation interval6, and
initial condition x E G0 • If { u~· 6 , i < oo} is a sequence of controls applied
to the chain, then we define the interpolated process and control by setting

eh•6 (t)=e?· 6 , uh• 6 (t)=u~· 6 , tE[i6,i6+6).

Define Th,{j to be the first time eh·6 (·) leaves~. and let mh· 6 (-) denote the
relaxed control representation of uh· 6 ( ·).
Before proving the lower bound, we must state the following lemma. The
lemma is analogous to and is used in the same way as Theorem 2.3.

Lemma 3.2. Assume the conditions of Theorem 3.1, and let cf>(·) be an
absolutely continuous function that satisfies cf>{O) E G0 , cf>(t) E G fort E
[O,a], and
1u k(cf>(s), ~(s))ds < oo.
Then given E > 0, there exists an absolutely continuous function cf>E(·) such
that cf>E(t) E G0 fort E [0, a), cf>E(O) = cf>{O), cf>E(a) = cf>(a), and

11u k(cf>(s),~(s))ds -lou k(cf>E(s),~E(s))dsl < €.

The proof is essentially the same as that of Theorem 2.3 and is therefore
omitted. We will only note that under {A3.1) we can assume the existence of
"( > 0 and v*{-) such that X E oG and J(x)IJ ~"(imply (v*(x))t = 0, where
v*(-) satisfies the conditions on v(·) given in the statement of the interior
390 14. Problems from the Calculus of Variations: Finite Time Horizon

cone condition. This means that when perturbing ¢( ·) as in the proof of


Theorem 2.3, the applied perturbation will be parallel to the interlace when
¢( ·) is on the interface. This implies a continuity property for the running
cost in spite of the discontinuity across the interface.

Proof of the Lower Bound. Consider any sequence of ordinary admissi-


u:·
ble controls { 6, i < oo} that are applied to the chain and let mh• 6( ·) be
the associated relaxed control representation .of the interpolation uh• 6 ( ·). In
proving the lower bound, we may assume that the associated running costs
are bounded from above. By Theorem 2.4, {(eh· 6 (·),mh• 6 (·)),h > 0,8 > 0}
is tight.
We next define random measures ll(i),h, 6 (-) on Rk x [O,T] by

11( 1),h, 6 (A X B) = LI{uh,6(s)EA,(eh,&(s))t~o}ds,


11( 2),h, 6 (A X B) = L I{uh,&(s)EA,(eh,&(s))t>O}ds.

Note that since mh· 6 (-) = 11( 1),h, 6 (·) + 11( 2),h,6 (·), the tightness of {mh· 6 (·),
h > 0,8 > 0} implies the tightness of {ll(i),h, 6 (·),h > 0,8 > 0} fori=
1, 2. The measures 11( 1),h,6 (-) and 11( 2),h, 6 (·) record the control effort that
is applied, when it is applied, and also distinguish between when the state
eh·6(s) is in {x: (x)t ~ 0} and {x: (xh > 0}.
We now apply the Skorokhod representation and extract a weakly con-
verging subsequence from

with limit
( x(·), m(-), 11(1) (-), 11< 2>(·),f).

It follows easily from the definitions that ll(i}(R,k x [O,t]) ~ t, w.p.l. We


may therefore conclude the existence of subprobability measures 11~ 1 ) ( ·) and
~~~ 2 )(·), 0 ~ s ~ T, such that ll(i)(A x [O,t]) = J;~~~i)(A)ds,i = 1,2, for all
Borel sets A and t E [0, T], w.p.l.
The measures 11~ 1 ) ( ·) and 11~ 2 ) ( ·) possess the following properties. Almost
surely in s, and w.p.l,

(x(s))l < 0 o? r 11~ 1 >(da) 1,


JJRk
=
(3.9)
(x(s)h > 0 o? r 11~2>(oo) = 1,
JJRk
(3.10)
14.3 Problems with a Discontinuous Running Cost 391

r av~ 1 )(da) + Jr av~2)(da) = Jr ams(da) = x(s).


JJRk JRk JRk
(3.12)

Equation (3.9) follows easily from the definitions of the v(i),h,.S(·) and the
weak convergence, whereas (3.10) and (3.12) follow from the relationship
mh· 8 (-) = v(l),h,<l(·) + v( 2),h,<5(-). The only property that is not obvious is
(3.11). We will first prove the lower bound assuming (3.11) and then show
(3.11).
Now fix an w for which there is convergence via the Skorokhod represen-
tation. We have

~ r r k(l)(x(s),a)V(l)(dads)
T/\-

lo J
T

JRk

+ r r k( )(x(s),a)v( )(dads)
T/\-
T 2 2
lo lmk
= 1 Lk [k( 1 )(x(s),a)v~ 1 )(da)+k(2>(x(s),a)v~ 2)(da)]
T/\-
r ds.

The set {s: (x(s))l = 0, (x(s)h =f. 0} is a set of measure zero. Therefore,
the definition of k(·, ·),the convexity of the k(i)(x, ·),and the properties of
the v;i)(·) given in (3.9)-(3.12) imply

Lk [k( 1 )(x(s),a)v~ 1 )(da) + k(2)(x(s),a)v~2)(da)] ~ k(x(s),x(s))

a.s. in s. We also have

g(~h· 8 (T 1\ Th,<>), T 1\ Th,o) ~ g(x(T 1\ 1'), T 1\ 1').

Assembling the inequalities, we conclude that

liminf 1T/\rh.61 k(~h· 8 (s),a)mh,<l(dads) + g(~h· 8 (T 1\ Th,<>), T 1\ Th,<l)

~1
(h,<l)--+0 0 JRk
TAr
k(x(s),x(s))ds+g(x(T/\f),T/\1').
392 14. Problems from the Calculus of Variations: Finite Time Horizon

As always, there is the difficulty due to the fact that r = inf{t: x(t) E 8G}
might be smaller than f. Using Lemma 3.2 in precisely the same way that
Theorem 2.3 was used in the proof of Theorem 2.5, we conclude that

1 TM
k(x(s), x(s) )ds + g(x(T 1\ f), T 1\ f) 2: V(x).

It follows from Fatou's lemma that


liminf Vh• 6 (x) 2: V(x)
(h,6)--t0

for the convergent subsequence. The proof of the lower bound is completed
by using an argument by contradiction.

We now prove {3.11). Consider any subsequence from

{ ( ~h,6(·), mh,6(·), 11 (1),h,6(·), 11 (2),h,6(-)), h > o, 6> 0}


that converges weakly to a limit

(x(·),m(·),v< 1)(·),v<2)(-)).

By the Skorokhod representation, we can assume that the convergence is


w.p.l. Fix"(> 0. In order to proceed with the proof of {3.11), we must first
build an approximation to the function F : 1R ~ 1R given by

lzl ::; 'Y


F(z) = { ~I if
if lzl > 'Y·
For each 11 > 0, let F'''(·) be a function such that
IF71(z)l < 2"( for all z,
F71(z) = F(z) for z E [-"f,"f],
IFJz(z)l < B for z f/. [-'Y/2,"{/2],
where B < oo depends on 1J > 0. Let

f71(z) = { FJ(z) if z f 0
-1 if z = 0,

and assume that J71(z) ~ 0 for z f/. [-"{, "f] as 1J ~ 0. We can write
T/6-1
4"( 2: L F71 ((~;:1h)- F71 ((~7' 6 h)
j=O
T/6-1
= L c5J71((~7'6h) [<u~'6h] + e~,6 + e~·6,
j=O
14.3 Problems with a Discontinuous Running Cost 393

where
T/8-1
e~· 8 = L r'((e;· h) [(e;:1h - (e;· h -t5(u7' h]
8 8 8
j=O

and
T/8-1
e;· 8 = L [F'~((e;tlh)- F'~((e7' 8 h)] -r'((e7·8 h) [(e;tlh- (e7' 8 h].
j=O

Owing to local consistency, e~· 8 --+ 0 in probability. We now use the fact
that F'~ approximates the absolute value function near the origin to deal
with the term e;•
8 • If Z1 and Z2 are smaller than "f, then

If Z1 and z2 are on the same side of the origin, then

where z is some point between z 1 and z2 • Suppose that h ~ 'Y/4. Then


eh·8 > eh· 8 where
2 - 2 '

T/8-1
e;· 8 = L [r((e;· h)- r'((e;· )d] [(e;tlh- (e;· h] I{l<~;·~hl~-r/4}
j=O
8 8 8

and (C' h
8 is a point between 8 (e;· h
and (e;tlh·
Since eh·8(-) converges
uniformly to a process with continuous sample paths,

in probability. Since

and the latter quantity is finite, e;·


8 --+ 0 in probability.
Sending (h, 15)--+ 0, and using the boundedness of the running costs and
the superlinearity condition (1.1) to justify convergence of the integrals, we
have
394 14. Problems from the Calculus of Variations: Finite Time Horizon

where J(l),l'/(-) [respectively f( 2)·'1(·)] is a continuous extension of ro from


(-oo,O) to (-oo,O] [respectively (O,oo) to [O,oo)J.
Sending 77---t 0, and using the definition of jl'l(-), we have

[ (ahlro,-yJ((x(s))l)11( 2)(dads)
jJRkx[O,T]

- [ (a)ll[--y,OJ((x(s)}l)ll(l)(dads) ~ 4'Y.
jJRkx(O,T]

A very similar proof that is based on approximating the function

if izl ~ 'Y
F(z) = { ~ if z > 'Y
-'Y ifz<-'Y

shows that

[ (ahlro,-yj((x(s))l)v( 2)(dads)
jJRkx[O,T]

+ [ (ahl[--y,OJ((x(s)}l)ll(l)(dads) ~ 4'Y.
JJRkx[O,T]

Adding and subtracting these equations gives

1 JRkx[O,T]
(a)ll[--y,OJ((x(s))l)ll(l)(dads) > -8'Y,

1 JRkx[O,T]
(ahlro,-yj((x(s))I)11( 2)(dads) < 8'Y.

Sending 'Y ---t 0, we have shown that

1
JRkx[O,T]
(ahl{o}((x(s))I)II(l)(dads) > 0,

1
JRkx(O,T]
(ahl{o}((x(s))1)11( 2 )(dads) < 0
(3.13)

w.p.l.
The argument that led to (3.13) can be repeated with s restricted to any
interval [a, b] c [0, T] with the same conclusion. Thus we can assume it
holds simultaneously for all such intervals with rational endpoints. Using
the definitions of 11~ 1 ) ( ·) and 11~ 2 ) ( •), this implies
14.3 Problems with a Discontinuous Running Cost 395

whenever (x(s)h = 0 a.s. ins and with probability one. This proves (3.11).

Before proving the upper bound we present the following lemma. The
lemma implies the existence of an €-optimal piecewise linear path for the
calculus of variations problem and is analogous to Theorem 2.2. The proof
of the lemma is given at the end of the section.
Lemma 3.3. Assume the conditions of Theorem 3.1. Then given any € > 0
and any absolutely continuous path¢: (0, T] --+ JRk satisfying ¢(0) = x and
rT .
Jo k(¢(s),¢(s))ds < oo, there exist N < oo, 0 =to< tt < · · · < tN = T,
and a function uE : (0, T] --+ JRk which is constant on the intervals [tn, tn+l),
n < N, such that if

then

and
sup [
o~t~T lo
t k(¢E(s),uE(s))ds- lot k(¢(s),¢(s))ds] : : :; €.
Furthermore, we can assume that for any n < N that either (¢E(t)h =1- 0
for all t E (tn, tn+l) or (¢E(t))! = 0 for all t E (tn, tn+l)·
Proof of the Upper Bound. Fix € > 0, and choose¢(·) with ¢(0) = x
such that if T = {t: ¢(t) E 8G}, then
{T/\r
lo k(¢(s), <i>(s))ds + g(¢(T 1\ r), T 1\ r):::::; V(x) + €. (3.14)

The case T > T. By Lemma 3.3 and (A3.2), there exists ¢E(-) satisfying
the conditions of the lemma and also

1T k(¢E(s), <i>E(s))ds + g(¢E(T), T) : : :; 1T k(¢(s), <i>(s))ds + g(¢(T), T) + €.

(3.15)
For each n < N, let an = <i>E(t), where t is any point in (tn, tn+l)· If
(¢E(t))l =1- 0 fortE (tn, tn+l), then we define a~1 ) = a~2 ) =an. If (¢E(t)h =
0 for t E (tn, tn+l), then we must prescribe a control that will yield a
running cost close to k<0 >( ¢E (t), an). By exploiting the continuity of the
k<i>(-,·), j = 1,2, there exist p~1 >,p~2 >,a~1 ) and a~2) which satisfy

Pn(l} + p(2}
n
= 1
'
p(l}
n
> 0 ' p(2}
n
> 0' (3.16)

(3.17)
396 14. Problems from the Calculus of Variations: Finite Time Horizon

(l)a(l}
Pn n + p(2)a(2)
n n =an (3.18)
and

for all t E (tn. tn + c5), where c5 is some positive number. Because t/l(t) is
not constant, it may not actually be the case that (3.19) can be guaranteed
for all t E (tn, tn+l) simultaneously. However, the continuity properties of
k(i}(·, ·)and </>£0 imply that we can replace the original partition by a finer
partition 0 =to< · · · < t& = T (if necessary) such that if (</>£(t)h = 0 for
t E (tn, tn+l), then (3.19) holds for all t E (tn, tn+l)· For simplicity, we will
retain the same notation for the original and refined partition.
For the remainder of the proof we will assume that h and c5 are small
enough that
u~~ol {a~l}' a~2}} c uh,6.

We can then define a nonanticipative control scheme for {~?· 6 , i < oo} in
terms of the a~>'s. For ic5 E (tn, tn+d, we set

h, 6 _ { a~1 ) if (~?· 6 )1 ~ 0
ui - a~2} if (~?,6h > 0.

Define ~h· 6 (·),mh· 6 (·),v(l},h, 6 (·),vC 2 },h, 6 (·) and Th,6 as in the proof of the
lower bound, but for this new control sequence. Because the a~/>'s are all
bounded, the collection

is clearly tight. Extract a convergent subsequence, with limit

( x( ·), m{·), vC 1>(·), vC 2>(·),f) .

Let the Skorokhod representation be used, arid fix any w for which there is
convergence. Let v~ 1 )(·) and v~ 2 )0 be the derivatives of vC 1>(·) and vC 2>(·),
respectively.
Fix n < N, and assume for now that x(tn) = </>£(tn)· It will be proved
below that this implies x(tn+l) = </>£(tn+l), and therefore this assumption
will be justified by induction. First consider the case (</>£(t)h :f; 0 for all
t E (tn, tn+d· Then by the definition of the control scheme

v8 ({an}) = 1 a.s. for s E (tn,tn+l)·

Therefore, x(tn+d = </>£(tn+l) in this case. Next consider the case (</>£(t)h
= 0 for all t E ( tn, tn+l). The definition of the control scheme uh•6 ( ·) over
14.3 Problems with a Discontinuous Running Cost 397

the interpolated time (tn, tn+ 1) implies

(3.20)

a.s. for s E (tn, tn+l). An elementary stability argument that uses the
Lyapunov function f(x) = l(xhl and (3.9), (3.12), (3.17), and (3.20) shows
that (x(s)h = 0 for s E (tn.tn+d· Therefore,

Combining (3.20) and (3.21), we see that

Note that under (3.17), p~1 ) and p~2 ) are uniquely determined by p~1 ) +
p~2) = 1 and

. umqueness
ThIS · Imp r 118U> (da )
· 1·Ies JR.k = PnU> , and , there1ore,
r x· ( s ) = Pn<1>an<1>
(2) (2) •
+Pn On = On a.s. for s E (tn, tn+l)· Smce we have assumed x(tn) =
rp£(tn), this implies x(tn+l) = rp£(tn+l)· By induction, x(s) = rp£(s) for
s E [0, T). Together with r/J£(t) E G 0 for t E [0, T), this implies f > T.
We now use the properties of the v~i) ( ·) shown in the previous paragraph
to prove the upper bound. For the given control scheme, we have

The boundedness of u:;;01 { a~1 ), a~2)} and the dominated convergence the-
398 14. Problems from the Calculus of Variations: Finite Time Horizon

orem then give

lim [1 TI\Th '6f. k(eh• 6 (s),a)mh,.S(dads)


(h,.S)--tO 0 R.k

+ g(eh• 6 (T 1\ rh,.s), T 1\ rh,.s)] (3.22)

= 1T JRk [k< 1 >(x(s),a)v~ 1 l(da) + k< 2>(x(s),a)v~ 2l(da)]


+ g(x(T), T).
Using the properties of the v~i) ( ·) shown in the previous paragraph and the
equality x(·) = ¢€(·), the right hand side of (3.22) is equal to

(3.23)

where

if (¢€(t)h <0
if (¢€(t)h > 0
if (¢€(t)h = 0.

The definition of the an and the a~) imply that (3.23) is bounded above
by
1T k(¢•(s), if>€(s))ds + g(¢•(T), T) + Tf.:::; V(x) + (2 + T)f..
Note that (3.22) and (3.23) hold w.p.l. By again using the boundedness
of u~,:01 { a~1 ), a~2 )} and the dominated convergence theorem, we have

limsupE~ '
h6[1TI\Th,6 f. _k(eh• (s), a)mh• (dads)
6 6
(h,.S)--tO 0 Rk

+ g(eh•6 (T 1\ rh,.s), T 1\ rh,.s)] :::; V(x) + (2 + T)f.,

and, therefore,
lim sup Vh· 6 (x) :::; V(x) + (2 + T)f..
(h,.S)--tO

This proves the upper bound for the case r > T.


The case r :::; T. The only difference between this case and the one treated
previously concerns the limiting behavior of the exit times 1'h,.S. By redefin-
ing the f.-optimal path ¢0 if need be, we can assume the existence of
r+c ·
c > 0 such that ¢(t) ¢ G fortE (r, r+c) and fr k(¢(s), ¢(s))ds:::; f.. For
14.3 Problems with a Discontinuous Running Cost 399

arbitrary t: > 0 and t: 1 > 0, we can find a path ¢l (·) according to Lemma 3.3
such that <jJE(t) E G0 fortE [O,r-t:1) and <jJE(t) ~ G fortE [r,r+t:1]· This
allows us to control the time and location of exit. The proof now follows
the same lines as for the case r > T. •

Proof of Lemma 3.3. Let¢(-) be absolutely continuous with ¢(0) = x


and J;{ k(¢(s), t/>(s))ds < oo. Given t: > 0, there exists 'Y > 0 such that
is- ti ~ 'Y implies i¢(s)- ¢(t)i ~ t:. By assumption, lx- Yi ~ t: implies
lk(i)(x,a)- k(i)(y,a)l ~ /(t:) (M + k(i)(x,a)) for all a E JRk and i = 1,2.
It is easy to show that the definition of k( 0 )(x,a) in terms of k< 1)(x,a)
and k( 2 )(x,a) implies that JkC 0)(x,a)- k(0 )(y,a)l ~ /(t:) (M + k( 0 )(x,a))
for all a E JRk and Jx - yj ~ t:. It is convenient to recall at this point
that the definition also implies k( 0 )(x, a) ~ k( 1)(x, a) t\ k( 2 )(x, a) whenever
(ah = 0.
Define
A(O) = {t: (¢(t)h = 0},

A( 1) = {t: (¢(t)h < 0},


A(2 ) = {t: (¢(t)h > 0}.
Let B(O) be a finite union of intervals [cj, dj], 1 ~ j ~ J, such that di ~ Ci+b
and such that A (O) C B(0). Without loss of generality, we can assume

for 1 ~ j ~ J and that maxi (di - Cj) < 'Y. For simplicity, we assume c1 = 0
and dJ = T. The required changes when this is not the case are slight.
Define
u~ = -d
_ ·.
J
1
CJ
ld·
Cj
3
t/>(s)ds.

If Cj+l > dj, let ej,k ~ Kj, be such that dj = e} < eJ < · · · < efi = ci+1
and maxj(eJ+l- ej) < 'Y· Note that (¢(t)h -=/= 0 fortE (dj, Cj+l)· Define

Finally, we set

uE(t) = t
j=1
(l{tE(cJ,di)}uJ + ~1 l{tE[eJ,e;+l)}uJ) ,
k=1

¢E(t) = x + 1t uE(s)ds.

With these definitions, ¢( t) = ¢' (t) whenever t = Cj. t = db or t = ej.


Clearly,¢'(·)-+¢(·) as t:-+ 0.
400 14. Problems from the Calculus of Variations: Finite Time Horizon

Since l¢(s)- ¢(c;)l ~ f" for s E [c;,d;),

1d; k(o) ( ¢(c;), ¢( s) )ds

~ 1-'

k(¢(c;),¢(s))ds

~ 1.::t.

3
3
( k(¢(s),¢(s)) +/{E) [M + k{¢(s),¢(s))]) ds.

The first inequality in the last display is due to the convexity of each of
the k(i)(x, ·), i = 1, 2, 3, and the fact that k< 0>(x, a) ~ k< 1>(x, a) l\k< 2>(x, a)
whenever (a )I = 0. Convexity and the definition of the uJ imply that

k(0>(¢(c;),uJ)(d;- c;) ~ 1d; k<0>(¢(c;),¢(s))ds.


3

Because ¢(c;) = ¢e(c;) and W(s)- <f>E(c;)l ~ € for s E [cj,d;),

1d; k(O) ( ¢E (8 ), UE ( 8) )ds


3

~ ( k< 0>(¢(c;), uJ) + f(E) [M + k<0 >(¢(c;), uJ)]) (d;- c;).


Combining these last three inequalities, we obtain

1d; k< >(¢E(s),uE(s))ds


0

~ {1+{2+/{E))/{E)) 1.'

3
(k(¢(s),¢(s))+(2+/(E))j(E)M)ds.

A similar estimate applies for each interval of the form [ej,eJ+ 1 ). Com-
bining the estimates over the different intervals gives the last inequality in
the statement of the lemma. Finally, the last sentence of the lemma is a
consequence of the definitions of the c;, d;, and ej. •
15
Problems from the Calculus of
Variations: Infinite Time Horizon

In this chapter we extend the results of the previous chapter to problems


with a potentially infinite time horizon. The control problem will be defined
on a bounded domain, and, as in the last chapter, the process is stopped
and an exit cost is paid when the domain is exited. In contrast to the
last chapter, there is no a priori fixed time interval. Instead, the process
can be stopped at any time prior to exiting the domain, at which time a
stopping cost is incurred. The problem of control until the domain is exited
(without the possibility of controlled stopping) can be obtained as a special
case, simply by choosing a stopping cost that is sufficiently large.
Problems of this type have recently attracted a great deal of interest, and
occur in problems of robust nonlinear control [71], large deviations [59],
computer vision [46, 141], and related problems involving the evolution of
surfaces and boundaries.
Since we are dealing with problems over a potentially infinite time inter-
val, conditions must be imposed which guarantee that the problem is well
posed. We will assume that k(x,o:);::: 0 for all x and o:. Given that k(·, ·)is
nonnegative, there are two fundamentally different cases. In the first case,
we assume the existence of ko > 0 such that k(x,o:);::: ko for all x and o:.
The convergence analysis in this case is relatively straightforward, since the
lower bound on the running cost provides an upper bound on the optimal
stopping time, as in Section 10.6. This case is discussed in Section 15.2.
The remaining case, which occurs frequently in applications, is much
more difficult. Here we have no a priori estimates on the optimal stopping
time, and the convergence theorem for the case of strictly positive running
cost no longer applies. We will consider two different ways of dealing with
402 15. Problems from the Calculus of Variations: Infinite Time Horizon

this issue. One approach is to perturb the running cost so that it is strictly
positive, and then remove the perturbation ash--+ 0. This method is dis-
cussed in Subsections 15.3.1 and 15.3.2. It turns out that one cannot send
the perturbation to zero in an arbitrary way, and even when the perturba-
tion tends to zero at a sufficiently slow rate one must impose conditions on
the "zero cost" sets K(x) ={a: k(x,a) = 0}. These conditions essentially
take the form of a description of the large time behavior of any solutions
to the differential inclusion¢ E K(</J). We first consider the case in which
all solutions to the differential inclusion that start in a neighborhood of G
are either attracted to a single point x 0 E G0 or else leave G. This result is
then extended to cover the case in which the limit set is given by a finite
collection of connected sets.
The alternative approach uses the same numerical scheme (and related
iterative solvers) as Section 15.2, but imposes greater restrictions on K(x).
The set {x: K(x) I- 0} is assumed to be the union of a finite collection of
connected sets, and additional conditions are assumed which imply that V
is constant on each of the sets in this collection. This approach is developed
in Subsection 15.3.3 in the context of an interesting application, namely, a
calculus of variations formulation of a problem in shape-from-shading. Also
included in Subsection 15.3.3 is a weakening of the condition required of G
in which we allow the complement of G to contain isolated points.
When approximating any stationary control problem an important issue
is the efficiency of the associated iterative methods for solving the dis-
cretized equations (see Chapter 6). Many different aspects of this issue are
considered in the chapter. For example, the special case where the running
cost is quadratic in the control (for each fixed value of the state) occurs
in many problems (e.g., large deviations and minimum time problems).
In Subsection 15.2.2 we show that if the "canonical" approximating chain
for a calculus of variations problem is used then the discretized dynamic
programming equation can be solved explicitly. This has significant impli-
cations for performance of the algorithm, since numerical minimizations
are not required for the iterative solvers discussed in Chapter 6. A special
case of this calculation is applied to the shape-from-shading problem in
Subsection 15.3.3, and numerical results are discussed there and for the
general case in Section 15.4. Section 15.4 also discusses other qualitative
aspects of the iterative solvers, and in particular the relation between the
number of iterations required for convergence of the solver and qualitative
properties of the approximating chain. It turns out that one can design
chains and associated iterative schemes that converge in a small number of
iterations, and with the number of iterations essentially independent of the
discretization level. This property makes the Markov chain approximations
very useful for industrial applications of minimum time and related shape
evolution problems. These features of the Markov chain approximations
were first explored in [14], and extensions to higher order accurate schemes
are developed in [48, 145].
15.1 Problems of Interest 403

An outline of the chapter is as follows. In Section 15.1 we formulate the


general problem of interest. The description of the approximation scheme
and statement of the convergence theorem for the case of strictly positive
costs is given in Subsection 15.2.1. In Subsection 15.2.2 we show that the
dynamic programming equations (and related iterative schemes) take a
much more explicit form in the case where k(x, a) is quadratic in a for each
x. Subsections 15.3.1-3 develop the two approaches for nonnegative running
cost discussed above, and give the proofs of convergence. The properties of
the approximations constructed in Sections 15.2 and 15.3 and their relation
to the associated iterative schemes are discussed in Section 15.4, as well as
related implementation issues.

15.1 Problems of Interest


Consider the problem

V(x) = inf [foP k(¢(s),¢(s))ds + g(¢(p))], (1.1)

where the infimum is over all p 2: 0 and absolutely continuous functions


¢ : [0, p] ---+ JRk satisfying ¢(0) = x. In the same way as in the previous
chapter, this problem may be rewritten as an optimal control problem.
Because the stopping time is directly controlled, the Bellman equation takes
the form
inf [.C 0 V(x) + k(x, a)] = 0, x (j B,
{ aEJRk
V(x) = g(x), x E B, V(x) ~ g(x), x (j B,
for some (a priori unknown) stopping set B.
In some problems the state space is already bounded (together with an
appropriate exit cost) as part of the problem formulation. In other problems
this is not the case, and the state space must be bounded before numerical
methods can be applied. In the latter case, remarks analogous to those
for the finite time problem hold concerning methods for modifying (1.1)
in order to achieve this. Both of these cases correspond to the analogue of
(1.1) in which¢(-) is stopped at the first timeT that it leaves the interior
of a given compact set G. The cost thus takes the form

V(x) =inf [fop/IT k(¢(s),¢(s))ds+g(¢(pl\r))], (1.2)

where the infimum is over the same set of paths as (1.1) and all p 2: 0.
Note that g(·) combines the stopping cost and the cost that is added when
the set G0 is exited. Thus, g( ·) will often be discontinuous on 8G.
Owing to the fact that p is potentially unbounded, one must impose
suitable conditions on the running cost k(·, ·) in order to guarantee that
404 15. Problems from the Calculus of Variations: Infinite Time Horizon

the minimal cost is bounded from below. We will make the assumption that
k(x, a:) ~ 0 for all x and a:. The mathematical problem (1.2) may still be
well posed even if this condition is violated. However, because most current
applications satisfy the nonnegativity condition we restrict our attention
to this case.
Except where explicitly stated otherwise, the following assumptions are
used throughout the chapter.

Al.l. The set G is compact and satisfies interior and exterior cone con-
ditions: There exist f > 0 and continuous functions v( ·) and w( ·) such that
given any x E 8G, Uo<a<eBw(x+av(x)) C G and Uo<a<eBea(x+aw(x))n
G=0.
A1.2. The function k(·, ·) is continuous and nonnegative, and it satisfies
the superlinear growth condition (14.1.1). There exist M < oo and f: JR.---+
[0, oo) satisfying f(t)---+ 0 as t---+ 0, such that for all x andy,

sup [lk(x, a:) - k(y, a:) I - f(ix- yi) (M + k(x, a:))] ~ 0.


Ct

A1.3. The function g( ·) is uniformly continuous and bounded when re-


stricted to either of the sets CO and JR.k - G0 •

15.2 Numerical Schemes for the Case


k(x,a) > ko > 0
15.2.1 The general approximation
Let P"'(x, yla:) and ~th(a:) satisfy the local consistency conditions (14.2.1)
and (14.2.2), and let {e~,i < oo} denote the associated controlled Markov
chain.

The Numerical Scheme. Our scheme for approximating the solution to


(1.2) is given by

v•(x) ~ min [u(x), -~ [~P"(x, YI<>)V"(y) + k(x, <>)o<l.t"(<>)]] (2.1)

if x E G~ and Vh(x) = g(x) for x ¢~·We then have the following result.
Theorem 2.1. Assume (A1.1), (A1.2), and (A1.3). Then for the scheme
defined by (2.1) we have

The proof combines the ideas used in the proofs of Theorems 10.6.1 and
14.2.5 and is omitted.
15.2 Numerical Schemes for the Case k(x, a) ~ ko > 0 405

15.2.2 Problems with quadratic cost in the control


A particularly important special case of the problems considered in the
last subsection is that in which the running cost k(x, a) is quadratic in a.
For example, consider a control problem of the following form. Let b( ·) and
a(·) beak-dimensional vector and a k x k matrix of continuous functions
mapping JRk into JR. Given x E JRk, we consider the dynamics

(p = b(¢) + a(¢}v, ¢(0} =X

for t E [0, r], where the control v is square integrable over any interval
[0, T], T < oo. Associated with these dynamics is the cost

1
W(x, v, p) = 2 Jo
r"P [lv(t)l 2 + c(¢(t))] dt + g(¢(r)), (2.2)

where c, g : JRk I-t 1R are continuous. Further assume that v takes values
in JRk and that a(·) = a(·)a'(-) is uniformly positive definite on G. By
defining the running cost
1
k(x,a) = 2(a- b(x))'a- 1 (x) (a- b(x)) + c(x},
the dynamics and the cost above can be rewritten in the calculus of varia-
tions form (1.2).
Note that a more general cost for the original control problem could
also give rise to a calculus of variations problem of the given form. In
particular, in place of lvl 2 in (2.2} one could consider a positive definite
quadratic form [e.g. (v- h(x))'A(x)(v- h(x))] and still obtain a calculus
of variations problem from within the class described above.
Other important examples are minimum time type problems, which can
be converted into this form by a simple transformation of the independent
variable. For example, the standard minimum time problem with dynamics
x = u, lui ~ 1 and cost k(x, u) = 1 can be put into the calculus of variations
form (i.e., dynamics (p = u,u unconstrained} by using the cost k(x,u) =
iul2/4 + 1.
Consider the natural approximating chain of Example 14.2.1. It turns
out that for this class of problems and with this approximating chain the
dynamic programming equation (2.1) can be solved more or less explic-
itly, and moreover the naturally associated Gauss-Seidel iterative scheme
for solving this equation [cf. Section 6.2] is extremely efficient. In all two
and three dimensional problems on which it has been tested, the iterative
scheme converges to the solution after a small number of iterations, with
the number of iterations is essentially independent of the discretization
level h. These properties of the iterative scheme can be best understood by
using its interpretation as a functional of a controlled Markov chain, and
discussion on this point is given in Section 15.4. The purpose of the present
406 15. Problems from the Calculus of Variations: Infinite Time Horizon

section is to indicate how the quadratic cost and properties of the approxi-
mating chain can be exploited to efficiently solve the dynamic programming
equation.
Although we do not discuss the matter here, the degenerate case [i.e.,
when a(x) is not positive definite] is also of interest, and under appropriate
conditions this case can be dealt with just as easily [14]. We also note that
schemes with higher order accuracy can be based on these Markov chain
approximations [48, 145], and that the iterative schemes used to solve for
these higher order approximations inherit the favorable properties of the
lower order scheme considered here.
When solving the dynamic programming equation (2.1) by an iterative
scheme, the infima that must be calculated at each iteration depend on the
procedure chosen (e.g. Jacobi, Gauss-Seidel). However, for all the standard
iterative schemes the quantities that must be computed are always of the
form

where f is a real valued function defined on the grid h?Lk and which de-
pends on the particular iterative method used. Let llall1 = 2::~= 1 lail· For
any x E Gh and for the given ph and tlth, the infimization becomes

l~~ { 11~11 k(x,a) +. L


tE{l, ... ,k},±
ll:t f(x ± hei)} (2.3)

1\ { ~b(x)'a- 1 (x)b(x) + hc(x) + f(x)}.

The second term in the expression above corresponds to the case a = 0.


The evaluation of (2.3) can be carried out by considering a number of
constrained infima. To illustrate, we consider the problem in three dimen-
sions. Except at the origin, in each of the 8 octants defined by the coor-
dinate directions ph(x, yla) and flth(a) are smooth functions of a. Thus
it is reasonable to infimize over each octant separately and then take the
minimum among the obtained infima. Within each octant, the calculation
of the infimum requires that one consider a number of unconstrained and
constrained infimization problems. However, it turns out that all of these
problems can be formulated in terms of the following basic constrained
minimization problem.
Let A be a positive definite, real, symmetric r x r matrix, let B and j3
be vectors in JRr, and let v E (O,oo), "f E [O,oo) be given. Define 1 to be
the transpose of the r-dimensional vector (1, ... , 1).
15.2 Numerical Schemes for the Case k(x, a) 2:: ko >0 407

Lemma 2.2 Consider the problem of infimizing

w\ {~(w-B)'A- 1 (w-B)+w'.B+'Y} (2.4)

over wE JRk satisfying w'1 > 0. Define

s= ( 1' A,B- vB'1) 2 - 1'A1 [.8' A,B- 27v- 2vB'.8].

If s ~ 0 then the minimum of (2.4) equals

r= 1 ,~ 1 [ -vB'1+1'A.B+vfs ], (2.5)

and when s >0 it is uniquely achieved at

=B-~[(.1
w v JJ + (vB'1-1'A.B-vfs)]
1'A1 1 .

If s < 0 then the infimum of (2.4) equals -oo.


Proof. We first consider the constrained minimization problem

min..!:. [~2 (w-B)'A- 1 (w-B)+w'.B+'Y] (2.6)


wEJR.k 1L

subject to w'1 = JL > 0. We associate a multiplier A with the constraint,


differentiate the Lagrangian, and equate the gradient to zero to obtain the
system of equations

Using the constraint to identify A, the solution

w
= B_ ~
v
(f.l + (vB'1-1'A,B-
fJ 1'A1
vp,) )
1

is then substituted into (2.6) to obtain the function

- 1- { -+-+1
1'A1 2
VJL s
2vp,
'A ,B-vB1 I }
'
which is to be infimized over JL > 0. Clearly, if s < 0 then the infimum is
equal to -oo. If s ~ 0 then the derivative with respect to JL has two roots,
of which only the positive root, p,* ~ vfs/v, satisfies the constraint on JL
and gives a local minimum, which is in fact a global minimum. Substituting
p,* in the expression for w gives the form of the minimizer, and substituting
the minimizer in (2.4) yields the minimum value expressed in (2.5). •
408 15. Problems from the Calculus of Variations: Infinite Time Horizon

A Procedure for Solving (2.1). As noted previously, iterative solvers


such as Jacobi and Gauss-Seidel all take the same general form (2.3). The
method for solving this infimization problem involves solving it in each
orthant of JRk and then minimizing over all orthants.
To illustrate the necessary calculations in each orthant, we consider the
case with Oi 2::: 0, i = 1, ... , k. All other cases differ only in notation. The
resulting constrained minimization is

inf
{a:a,~O,aoFO}
[-*{hk(x,o) +
0
Lod(x
i
+ hei)}]·

To simplify notation, we omit the dependence of the functions b(x), u(x)


and c(x) on x and write (It, ... , /k) in place of the vector (/(x+het), ... , f(x
+hek)). With these changes in notation the problem takes the form

inf
{ a:a,~O,aoFO}
[-+-1 {~2 (o-b)'a- 1 (o-b)+hc+o'!}].
0
(2.7)

The solution to (2.7) will depend on whether or not b or cis 0, and thus
we consider the two cases separately.

The Case b = 0 and c $ 0. In this case the infimum in (2.7) is achieved


in the limit o ---+ 0. We first note that if b = 0 and c < 0 then the infimum
in (2.7) is -oo. Since for points satisfying this condition we would also have
V(x) = -oo, we need not consider this situation. If c = 0, the quadratic
term in the minimization tends to zero faster than o'1 as o tends to the
origin. Because of this, the infimum will be obtained only in the limit o ---+ 0.
In fact, if the minimum of!; over j E {1, ... , k} is attained at i, then the
infimum in (2.7) equals /i and is obtained by letting o = 8ei and then
sending 8 _J.. 0.

The Case b =F 0 or c > 0. In this case, the infimum in (2.7) is achieved


away from zero. A key fact that is needed is that there is only one local
minimum for this constrained problem, which equals the global minimum.
This follows from [14, Lemma 7.2].
The procedure used to compute (2.7) is as follows. The first step is to
consider the relative interior of the orthant, temporarily ignoring the non-
negativity constraints on the Oi (although we still assume o'1 > 0). This
allows the application of Lemma 2.2. If the infimum of this problem is finite,
the nonnegativity constraints Oi 2::: 0 are tested for the candidate minimizer.
If these constraints are satisfied, then the unique local minimum has been
found. If there is no candidate minimizer, or if the nonnegativity constraints
are not satisfied, the unique local minimum must be on the boundary of
the orthant. Then the next step will be to search the k - 1 dimensional
faces. Fortunately, the minimizations constrained to each of these can also
be done by applying Lemma 2.2. For instance, let Ok = 0. Just as when
15.3 Numerical Schemes for the Case k(x, a) ~ 0 409

the interior of the orthant was under consideration, the first step is to com-
pute the minimum as if the rest of the variables were unconstrained (save
the constraint a1 + ··· + ak-1 > 0). The solution of this lower dimen-
sional problem requires a change of variables. Define 5 = (at, ... , ak_l)',
b = (bt, ... ,bk-1)', k = (Kt, ... ,Kk-1)' and b = bk. We can then write
(with matrices A 11 , A 12 , A21 , A22 of appropriate dimension)

Lemma 2.2 can be applied once more to compute the minimum of this
quantity. If the solution satisfies the constraints ai 2:0, i= 1, ... , k-1, and if
the gradient of the original objective function points into the k-dimensional
orthant, then the search in the orthant is complete. If not, then the search
continues through the remaining faces, and if necessary, through the lower
dimensional faces. For example, ifk = 3 and the minimum over the orthant
{a 1 2: 0, a 2 2: 0, a3 2: 0} is not found in the interior, the search continues
on the faces {a1 2: O,a2 2: O,a3 = O},{at 2: O,a3 2: O,a2 = O},{a2 2:
0, a3 2: 0, a 1 = 0}. If the search there fails as well, one continues with the
faces {a 1 2: 0, a 2 = a 3 = 0}, etc. Lemma 7.2 of [14] guarantees that the
procedure will find the unique global minimum.

15.3 Numerical Schemes for the Case k(x, a) >0


In this section we extend the results of the previous section to problems
where k(x, a) is nonnegative, but not necessarily strictly positive. When
problems of this sort are discretized, the resulting nonlinear equations do
not usually have unique solutions. As a consequence, the construction of
approximations and the proof of their convergence is much more delicate.
As discussed in Section 15.1, we present two approaches. The first approach,
which is developed in Subsections 15.3.1 and 15.3.2, perturbs the running
cost, and shows that if the perturbation is removed at a sufficiently slow
rate, then under some conditions on the zero sets K(x) ={a: k(x,a) = 0}
the resulting approximations will converge as h -+ 0. The second approach
uses the same approximation as in Section 15.2, but requires that the points
{ x : K (x) i= 0} be the union of a finite collection of connected sets, and uses
conditions that imply V is constant on each of these sets. This approach is
developed for a particular application in Section 15.3.3.
410 15. Problems from the Calculus of Variations: Infinite Time Horizon

15. 3.1 The general approximation


We again assume ph(x, yja) and dth(a) satisfy the local consistency con-
ditions (14.2.1) and (14.2.2) and take {~~,i < oo} to be the associated
controlled Markov chain. Without a positive lower bound on k(·, ·),a naive
discretization of the formally derived Bellman equation can yield incorrect
results. The basic difficulties can be illustrated by a simple example. Sup-
pose 0 E G 0 and that k(O, 0) = 0. Then under (A1.3) the problem (1.2) is
still well defined in the sense that V (x) is bounded from below. However,
there is a difficulty with the evaluation of V(O) in that there is no guar-
antee of the existence of a minimizing path ¢( ·) and associated controlled
stopping time p which satisfy p A inf{t: ¢(t) E 8G} < oo. Furthermore, if
¢(-) is t: -optimal, then so is (¢* ( ·), p*), where p* = p + c,
¢(t- c) fort~ c
<P*(t) = {
0 fort E [0, c],
and c ~ 0 is arbitrary. The facts that p can be arbitrarily large and k(O, 0) =
0 imply that a nearly minimizing path has no need to leave the origin in
any finite amount of time.
Suppose one were tempted to use the scheme (2.1) with ph(x, yja) and
~th(a) as given in Example 14.2.1 to approximate V(x). Recall that in
many cases, equation (2.1) itself must ultimately be solved by one of the
methods discussed in Chapter 6, such as the iteration in policy space
method. It turns out that there are feedback controls uh(x) for which the
associated cost is bounded and for which the matrix defined by
R = {r(x,y),x,y E 8Gh uan,
( ) - { ph(x,yjuh(x)) xEG~
r x,y - 0 x ¢ G~,
8Gh = {y E G~: ph(x,yia) > 0, for some x E G~,a E .IRk}
is not a contraction. (See the discussion in Chapter 6 on the role of the con-
traction condition.) This means that there will not, in general, be unique-
ness for the solutions to (2.1). For a very simple example, assume G~ = {0}
and let ph(x, yja) and ~th(a) be as in Example 14.2.1. Then Vh(O) = c
is a solution to (2.1) for any c ~ infzEG g(x), with the associated feedback
control given by uh(O) = 0. More interesting examples are given later in
this subsection and in Subsection 15.3.3.
In order to circumvent this difficulty, we return to the definition of V(x).
For each TJ > 0 let k'"(x, a) be a function that satisfies (A1.2), and also
k"~(x, a) 2: TJ, k"~(x, a) ..j.. k(x, a)
for all x and a. For example, we can take k"~(x,a) = k(x,a) VTJ. Define

V"'(x) = inf [1pAT k"~(¢(s),~(s))ds + g(</J(p/1. r))], (3.1)


15.3 Numerical Schemes for the Case k(x, a) 2 0 411

where the infimum is over the same functions and stopping times as in
the definition of V(x). Clearly, k11 (·, ·)satisfies the conditions of Subsection
15.2.1. Furthermore, it is quite easy to prove that V11 (x)-!. V(x) as rJ--+ 0.
Based on the strict positivity of k'1(·, ·),the methods of Subsection 15.2.1
can be applied to obtain approximations to V11 (x). If ry > 0 is sufficiently
small, then V11 (x) should be a good approximation to V(x). Although this
is the basic idea behind the algorithm presented below, it turns out that
we cannot send ry --+ 0 and h --+ 0 in an arbitrary fashion. Rather, rJ and h
must simultaneously be sent to their respective limits in a specific manner.

Numerical Schemes. Let ph(x, yla) and fl.th(a) satisfy the local consis-
tency (14.2.1) and (14.2.2). For each ry > 0, we define an approximation
V11h(x) of V11 (x) by

V,:0(x) ~ min [g(x), .~.\'• [~>"(x, ulo)V,'(y) +k"(x, o)At"(o)]] (3.2)

if X E G~ and v;(x) = g(x) for X t/. G~. It follows from the positivity of
ry that (3.2) uniquely defines V11h(x). Our approximation to V(x) will then
be given by Vh(x) = V11~h)(x), where ry(h) tends to zero ash--+ 0.

Remarks on Implementation. The strict positivity of k'1(., ·)for ry > 0


implies that (3.2) has a unique solution. Thus, the approximation V11h(x)
can be obtained by solving (3.2) by any of the methods of Chapter 6 (e.g.,
the approximation in value space method). However, it turns out that the
performance of the algorithm used to solve (3.2) can depend heavily on the
initial condition used to start the algorithm. In particular, choosing too
low an initial condition can cause the algorithm to converge quite slowly.
The effect of a low initial condition on the speed of convergence becomes
more and more severe as the infimum of k(·, ·)tends to zero. On the other
hand, convergence is typically quite fast when a large initial condition is
used for the algorithm. This behavior is, in part, due to the fact that we are
working with a minimization problem and is also due to the deterministic
nature of the controlled process that the Markov chain is approximating
[i.e., ¢(t) = u(t)]. More extensive remarks on these points will be given in
Section 15.4.

15.3.2 Proof of convergence


As remarked above, it will ultimately become necessary to impose a condi-
tion on the manner in which ry(h) tends to zero. However, this condition is
not needed for the upper bound, and, therefore, we separate the statements
for the different bounds into two theorems. The first deals with the upper
bound. We will use the same notation and definitions as in the proof of
Theorem 14.2.4 save that, for each h > 0, we replace k(·, ·)by krt(h)(·, ·).
412 15. Problems from the Calculus of Variations: Infinite Time Horizon

Theorem 3.1. Assume (ALl), (A1.2), and (Al.3) and that 71(h) > 0 is
any sequence satisfying 71(h)-+ 0 ash-+ 0. Then

lim sup V~h)(x) ~ V(x).


h-tO

Proof. Let any e > 0 be given. Because k"(·, ·) . ). k(·, ·), there exists a
relaxed control m( ·) and an associated stopping time p < oo such that if

x(t) = x + t ]IR~c
lo
f am(dads), r = inf{t: x(t) E 8G},

then

1p/\T1
0 JRic
k11 (x(s),a)m 8 (da)ds + g(x(p 1\ r)) ~ V(x) +f
for all sufficiently small 7J > 0. Using Theorems 14.2.2 and 14.2.3 in the
same way as they were used in the proof of Theorem 14.2.5 we can extend
the definition ofm(·) beyond T such that m(·), x(·), and p have the following
properties. Either
x(·) E CO for all t E [O,p] (3.3)
or p > T, and there is v > 0 such that
x(t) E G 0 fortE [0, r), x(t) fl. G fortE (r, T + v],
1t k11 (x(s),a)m 8 (da)ds-+ 0 as t..).. T
{3.4)

uniformly in all sufficiently small 7J > 0. The remainder of the proof for
these two possibilities is analogous to the proof of Theorem 14.2.5 for the
two cases of (14.2.22) and {14.2.23). We will give the details only for the
case of {3.3). The proof for (3.4) is a straightforward combination of the
ideas used in the proofs for the cases {3.3) and {14.2.23).
By Theorem 14.2.2 we may assume the existence a finite set UE 1 C JRk,
i5 > 0, and an ordinary control uE 1 ( ·) with the following properties. uE 1 ( ·)
takes values in UE 11 is constant on intervals of the form (ji5,ji5 + 15), and if
xE 1 ( ·) is the associated solution, then

and

sup
09:5p
Ilot Jf
JRk
t k(xE (s),uE (s))dsl ~fl.
k(x(s),a)m 8 (da)ds-
lo
1 1

In order to apply UE 1 (.) to the chain {ef' i < 00}' we recursively define
the control applied at discrete time i by uf = UE 1 ( tf) until Ph = tf+l ~ p,
15.3 Numerical Schemes for the Case k(x,a) ~ 0 413

where tf+I = tf + ~th(uf). By construction, the interpolated stopping


times Ph are actually deterministic and satisfy

Ph ---+ P·

The proof from this point on is an easy consequence of the fact (due to
Theorem 14.2.4) that sup0 ~t~Ph J~h(t)- x€ 1 (t)J ---+ 0 in probability, and is
the same as that of the finite time problem treated in Theorem 14.2.5. •

It will turn out that much more is required for the lower bound, including
a condition on the rate at which TJ(h) ---+ 0. In the next example we show
that if TJ( h) ---+ 0 too quickly, then

lim sup V~h)(x) < V(x)


h--+0

is possible.

Example. We consider a two dimensional problem in which G = {(xt, x2) :


Jxtl V Jx2J :5 1} and k(x,a) = Ja1 - a2J for JaJ :5 2. We also assume
that k(·, ·) is defined for JaJ > 2 in such a way that {Al.2) holds [e.g.,
k(x, a) = Ja 1 - a 2J V (JaJ 2 - 2)]. The zero cost controls therefore include
the set {(at,a2): a1 = a2,JaJ :52}. We consider a stopping cost g(·)
that satisfies g(x) = 1 for x E G0. We will also assume that the restriction
of g(·) to 8G is continuous, with g(x) ~ 0 for all x E 8G, g(x) = 0 if
x E {x E 8G: Jx1 +x2J :5 1/2}, and g{1,1) = g{-1,-1) = 1. Thus,
the stopping cost on aa is strictly positive in a neighborhood of each of
the northeast and southwest corners (1, 1) and (-1, -1), and identically
equal to zero in a neighborhood of each of ( -1, 1) and {1, -1). For Sh, Gh,
ph(x, yJa), and ~th(a) we will use the natural choices given in Example
14.2.1.
It is not hard to see that our assumptions on the running and stopping
costs imply V(O, 0) > 0. However, as we now show, one can choose TJ(h) > 0
such that
l~ V11~h)(O,O) = 0 < V(O,O).

Clearly, lim infh--+0 V11~h) {0, 0) ~ 0. To prove lim suph--+O V11~h) {0, 0) :5 0, we
will exhibit a control scheme and TJ(h) > 0 that achieves cost arbitrarily
close to zero. For each i, let

u'!- = { ( -1, -1) for <~rh + (~fh >o


' (1,1) for {~fh + <~rh ::::; o.

Note that the running cost for this control process is identically zero.
With this definition of the control, the process {~f, i < oo} always sat-
isfies J(~fh + {~f)2J < 1/2 (for small enough h). Thus the control scheme
414 15. Problems from the Calculus of Variations: Infinite Time Horizon

accumulates no running cost and also prohibits the process from exiting
near either the northeast or southwest corners (where the exit cost is high).
However, it does allow exit near either of the other two corners, where the
exit cost is zero. The process { (~f) 1 - ( ~f )2, i < oo} is easily recognized as a
symmetric random walk on {ih, i E ..Z}. As is well known, this implies that
for any given value M, there exists i < oo such that l(~fh- (~fhl 2:: M
(w.p.1). Therefore, given any t: > 0 and the initial condition ~8 = (0, 0),
there exists n < 00 such that the process ~f will enter aa h at some point
x [with g(x) = OJ by discrete time n and with probability greater than
1 - t:. Thus, lim sup71 --to V,:O(O, 0) = 0, which demonstrates the existence of
'fl(h) > 0 such that V71~h}(O,O)--+ 0.•

A Condition On the Rate at Which 11(h) -+ 0. Let _vh(x,yia) and


~th(a) satisfy (14.2.1) and (14.2.2). In particular, let c(h) be such that

lo(lal~th(a))l ~ c(h)iai~th(a),

where o(lal~th(a)) represent the "error" terms in the local consistency


equations (14.2.1) and (14.2.2) and c(h)--+ 0. For example, if the transition
probabilities of Example 14.2.1 are used then we can take c(h) =h.
We then impose the following condition on the rate at which 'f/ = TJ(h)
can tend to zero:
'fl(h)fc(h)--+ oo (3.5)
ash--+ 0.

The absence of an a priori bound on the optimal stopping times for


nearly optimal approximating processes {~r, i < oo} means that we will
need yet another assumption besides (3.5) in order to prove convergence.
As remarked at the beginning of this section, the assumption takes the form
of conditions that are placed on the zero sets K(x) = {a: k(x,a) = 0}
in order to prove the lower bound. We begin by considering a very simple
example of such an assumption. This assumption will be weakened and
discussed further in (A3.2). For a set or point A, define N-y(A) = {x :
d(x, A) ~ 'Y}, where d(x, A) is the Euclidean distance between x and A.
A3.1. There exist a point Xo E ao
and an open set Gl containing G with
the following properties. Given any 'Y > 0, there exists T < oo such that
for all t 2:: T,

x(O) E G0 , x(s) = { am 8 (da) (a.s.)


JJRk
and
t f
Jo JJRk
k(x(s),a)m(dads) = 0
15.3 Numerical Schemes for the Case k(x,a) ~ 0 415

imply

Remarks. Thus, solutions with zero running cost either are attracted to
the point x 0 or else leave on open neighborhood of C in finite time. The
sets K(x) often have a particular significance for the underlying problem
which gave rise to the calculus of variations problem. See the book [59]
and the example of the next subsection. Note that we may, of course, have
points x where K(x) = 0.

Theorem 3.2. Assume (ALl), (A1.2), (A1.3), and (A3.1). Then for the
scheme defined by (3.5) and (3.2), we have

liminfVh(x) ~ V(x).
h--+0

Before giving the proof, we state a lemma. The proof of a generalization


of the lemma will be given later in this section.

Lemma 3.3. Assume (A1.2) and (A3.1). Then given any 1 > 0 and
M < oo, there is T < oo such that if x(t) = JJRk amt(da) (a.s.) for an
admissible relaxed control m(·), and if x(t) E C- N-y(xo) fortE [O,T],
then J{ JJRk k(x(s),a)m(dads) ~ M.

As noted previously, a difficulty with applying weak convergence meth-


ods directly to this problem is the lack of information regarding the large
time behavior of limits of the interpolated Markov chains. Consider the
case when the stopping cost is so high that the controlled stopping option
is never exercised. Suppose the interpolated processes corresponding to a
given sequence of controls spend an ever larger amount of time in an ever
smaller neighborhood of x 0 , before moving out of C 0 . In particular, assume
that the limit paths of the process are all identically equal to x 0 . Then these
limit paths never exit C 0 , and the substantial control effort needed to move
the process from xo to 8C will not appear in the limit of the running costs.
The technique we will use to avoid this problem is to consider only those
sections of the sample path when the process is outside of a neighborhood of
xo. Unfortunately, keeping track of such detailed information regarding the
process will require some additional notation. A consequence of Lemma
3.3 will be that for small h the interpolated processes can only spend a
finite amount of time in this region without building up an arbitrarily large
running cost. It turns out that the uniformity given in the last statement
of Theorem 14.2.4 allows us to concentrate on these sections of the sample
paths (regardless of when they occur), and thus prove the lower bound.

Proof of Theorem 3.2. It is quite easy to show under (ALl) and (Al.2)
that given f > 0 there is 1 > 0 such that lx-yJ ::; 1 implies V(x) ~ V(y) -E
for all x, y E C [i.e., V (·) is uniformly continuous on C]. Thus, for any E > 0,
416 15. Problems from the Calculus of Variations: Infinite Time Horizon

there is 'Y > 0 such that IV(x)- V(xo)l ~ f for x E N-y(xo). For this 'Y and
M = V(xo) + 1, choose T according to Lemma 3.3.
We first consider the case when x E N-y(xo). Owing to the definition
ef,
of Vh(x) there is a controlled Markov chain { uf, i < oo} that satisfies
e~ = X and a controlled stopping time N h such that

(Nh/\Mh)-1
Vh(x):::: E;h L k"'(h)(ej, uj)ath(uj) + E;h g(e'Nhi\Mh)- f, (3.6)
j=O

where Mh is the time of first exit from 00. Let eh(·) and uh(·) be the
continuous parameter interpolations of { ef' i < 00} and {uf' i < 00}' re-
spectively, and let

Nh-1 Mh-1
Ph= L ath(uf), Th = L ath(uf).
i=O i=O

We can then rewrite (3.6) as

h [Phi\Th h
Vh(x):::: E: lo k"'<h>(eh(s), o:)mh(do:ds) + E: g(eh(Ph Arh))- f.,
(3.7)
where mh(·) is the relaxed control representation of the ordinary control
uh(·). The superlinearity condition (14.1.1) implies

for some fixed b < oo. Choose s(h) -t oo such that c(h)s(h) -t 0 and
TJ(h)s(h) -t oo. This is always possible under (3.5). When combined with
c(h)s(h) -t 0, the last equation implies

By Theorems 14.2.4 and 9.1.7, we can assume (eh(·),mh(·)) -t (x(·),m(·))


w.p.1 and that

w.p.l.
Fix an w E n such that these convergences hold. Define
15.3 Numerical Schemes for the Case k(x,a) 2:: 0 417

If no such j exists, we set ih = oo. Let


ih-1
ah = L ~th(u~),
i=O

and define
Th = { (Ph 1\ Th)- Uh, Uh <Ph 1\ Th
0, (Jh ;::: Ph 1\ Th·
Thus, Th is the interpolated time between Ph 1\ Th and the time before
that when ~h(·) last left N...,(x 0). We will now carefully examine the sample
paths over the interval [ah, (Ph 1\ Th)].
Define t(-) = ~h(· + ah) and define mh(-) to be the relaxed control
representation of the ordinary control uh(· + ah)· Finally, let vh = (Ph 1\
Th)/s(h) and

For the fixed w, consider the set

{(mh(-),Th,ph,Th,uh,rh,vh) ,h > 0}.

Since { mh(-), h > 0} is always precompact on the space 'R,(lll x [0, oo)), we
can extract a convergent subsequence with limit denoted by (m( ·), T, p, r, a,
r,v).

The case v > 0. Since vh = (Phi'ITh)/ s(h), this case implies that Phi'ITh --7 oo
quite fast. In fact, the condition 17(h)s(h) --7 oo implies

and so this case need not be considered further.

For the remainder of the proof we can assume v = 0. This implies that
Ph 1\ Th < s(h) for sufficiently small h > 0 and, in particular, gives the
estimate {3.8) without s(h) appearing. We can also exclude the case r = oo
because in this case the running costs again will tend to oo. An argument
very similar to that used in Theorem 14.2.4 shows that if

{3.9)

then m(JRk x [0, T]) = T and

t f amh(dads) --7 ft f am(dads)


lo lm.k lo lm.k
418 15. Problems from the Calculus of Variations: Infinite Time Horizon

as h -t 0 fortE [0, T]. For the given w it follows from (3.8) that

(e\), mh(-), Th,eh(·), mh(·),ph, Th, ah)


-t (x(·), m(·), T, x(·), m(·), p, r, a),

where
x(t)- x(O) = ft f am(dads)
lo JJRk
fortE [0, T]. Note that by construction x(O) E N-y(x 0 ) and x(t) fl. N-y 12 (xo)
fortE (0, Tj. Thus, we are examining the limits of eh(·) after the processes
have left N-y/ 2 (xo) for the last time. Note also that it may be the case that
a= oo, in which case x(·) cannot be determined from x(·).

The case p 1\ T < oo. For this case, we are essentially back in the setting of

1 1
the finite time problem, and the proof of Theorem 142.5 shows that

+ g(eh(Ph A rh)) 2: V(x).


Ph11Th
liminf k11 (h)(eh(s),a)mh(dads)
h-tO 0 JRk
(3.10)
For the remainder of the proof we can assume p = T = oo. Recall that,
by construction, x(O) E N-y(x 0 ) and x(t) E G- N-yj 2 (x 0 ) fort E [0, T].

The case T = oo. If T = oo, then by Lemma 3.3

(3.11)

The case T < oo. By Fatou's lemma

(3.12)

where g*(x) = lime-+oinf{g(y): lx- Yl ~ €}. As noted in the remarks fol-


lowing (A14.2.1)-(Al4.2.3) and also in the proof of Theorem 14.2.5, (AI. I)
and (A1.2) imply that V(x(O)), and therefore V(xo) -E, give a lower bound
for the right hand side of (3.12).

If we combine all the cases and use the assumption that x E N-y(xo), we
obtain
15.3 Numerical Schemes for the Case k(x, a) ~ 0 419

w.p.l. Therefore,
liminfVh(x) ~ V(x)- 3E uniformly for all x E Ny{x 0 ) (3.13)
h.--tO

for the original sequence h.


To complete the proof we must consider an arbitrary point in G0 • Define
the processes ~h(·) and mh(-) in the usual fashion for any nonanticipating
control sequence for which the associated running costs are bounded. Let
Th = inf{t: ~h(t) ¢ G0 }, Th = inf{t: ~h(t) E (IRk- G0 ) u Ny(xo)}, and let
Ph be the continuous time interpolation of the stopping time Nh. Extract
a convergent subsequence from the tight collection of random variables
{ (~h( ·), mh( ·),Ph, Th, Th), h > 0} and let (x( ·), m( ·), p, T, 7) denote the limit.
We assume the convergence is w.p.1, and for each w E n for which there
is convergence we consider separately the possible cases: (1) p ~ T, p < oo;
(2) T = 7' < p; (3) 7' < T ~ p; (4) 7' = T = p = oo. For the cases (1) and (2)
we are again dealing with a finite time problem (as far as the convergence
of the costs is concerned), and we have

For case (3), the definition of Th implies ~h("Th) E N 1 (xo) for all small
h > 0. Using the Markov property and the previous estimates for paths
that start inside N"'(xo), we have

where

V(x) =in£ { 1T k(¢, tP)ds + V(¢(T)) : ¢(0) = x, ¢(T) EN"'(xo), T > 0}.
By the dynamic programming principle of optimality, V(x) is bounded
below by V(x). For the last case of 7' = T = p = oo, Lemma 3.3 gives

liminf
h---to
rhATh
}0
r kfl(h)(~h(s),o:)mh(do:ds) =
} JRk
00.

Combining the four cases and using an argument by contradiction gives


lim in£ Vh(x) ~ V(x) - 3E.
h.--tO
420 15. Problems from the Calculus of Variations: Infinite Time Horizon

The proof of the theorem is completed by sending «: -+ 0. •

The proof of Theorem 3.2 can easily be extended. One such extension
uses the following generalization of {A3.1).
A3.2. There exist a finite collection of disjoint compact sets {Ki, i =
1, ... ,N} and an open set Gt containing G with the following properties.
We require that U~ 1 Ki c G0 , and that given any 'Y > 0, there exists
T < oo such that for all t ~ T,

x(O) E G 0 , x(s) = 1
H"
am 8 (da) (a.s.)

rt1
and
k(x(s),a)m(dads) = 0
Jo H"
imply

Remark. Although the assumption has a somewhat complicated statement


it is not very strong. The assumption is similar to and motivated by [59,
assumption (A), p. 169].
We will also need the following extension of Lemma 3.3. The proof of
this lemma is given at the end of the section.
Lemma 3.4. Assume (A1.2) and (A3.2). Then given any 'Y > 0 and M <
oo, there is T < oo such that ifx(s) = fn,. am 8 (da) (a.s.) for an admissible
relaxed control m(·), and if x(t) E G- (u~ 1 Ny(Ki)) fort E [0, T], then
f0T JlR" k(x(s),a)m(dads) ~ M.
Theorem 3.5. Assume (ALl), (A1.2), (A1.3), (A3.2), and that V(-) is
constant on each Ki· Then for the scheme defined by (3.5) and (3.2), we
have
liminfVh(x) ~ V(x).
h--+0

Outline of the Proof. The proof is essentially the same as that of The-
orem 3.2 but requires a good deal more notation. The constructions that
would be used are very similar to those that are used in the proof of The-
orem 3.7 below. An outline of the proof is as follows. Suppose that the
quantities eh(·),mh(·),Th,Ph. and so on, are defined as in Theorem 3.2.
Assume the Skorokhod representation is used and that a convergent sub-
sequence is under consideration. If the limit of Ph 1\ Th is finite, we are
dealing with a finite time problem (at least as far as the convergence of
the costs is concerned). We therefore assume that this is not the case. By
suitably defining random times, we can keep track of the excursions of the
15.3 Numerical Schemes for the Case k(x, a) ~ 0 421

sample paths taken between visits to small neighborhoods of the Ki and


ac. Lemma 3.4 guarantees that if the running cost is to be bounded, then
the interpolated time that the process is outside a neighborhood of the Ki
and up until it exits C 0 is bounded. Thus, we can apply Theorem 14.2.4 as
it was applied in Theorem 3.2 and get a lower bound on the running cost
accumulated during any excursion between neighborhoods of the Ki (say
from Ki to Kj) of the form

in! {loT k(~, J,)ds, ~(0) E N7 (K;), ~(T) E N (K;), T > 0}- <.
7

A similar statement holds regarding excursions between a neighborhood of


one of the K/s and a neighborhood of 8C. Because the process ~h(·) even-
tually must either exit C 0 or be stopped at some finite time Ph, a dynamic
programming argument gives the lower bound for points in uf: 1N 7 {Ki)·
For the points in C 0 that are not in uf: 1 N7 {Ki), we use the same argu-
ment as in Theorem 3.2. By Lemma 3.4, the process ~h(·) must either exit
G0 or enter a small neighborhood of uf: 1Ki in a uniformly bounded time,
or else accumulate an infinite running cost (in the limit h --+ 0). Thus, we
again have the lower bound via a dynamic programming argument. •

Proof of Lemma 3.4. The proof is very close to that of Lemma 2.2 in
[59]. We first claim there exists T < oo and c > 0 such that x(t) E G-
(u~= 1 N7 (Kn)) fortE [O,T] implies for JJRk k(x(s),a)m(dads) 2: c. If not,
we can find Ti--+ oo, xi(O) E C-U~= 1 N7 (Kn), and mi(·) E 'R.([O, Ti] xJRk),
such that for Xi ( ·) defined by

Xi(t)- Xi(O) = {t { ami(dads)


lo JJRk
we have xi(t) E C- (u~= 1 N7 (Kn)) for all t E [0, Ti] and

{T; { k(xi(s),a)mi(dads)--+ 0.
lo JJRk
We can choose a convergent subsequence such that

(xi(O), xi(·), mi(·), 1i)--+ (x(O), x(·), m(·), oo),


where m(·) E 'R.([O,oo) x JRk),

x(t)- x(O) = t { am( dads),


lo JJRk
and x(t) E G- (u~=l N 7 (Kn)) for all t < oo. By Fatou's lemma we have

roo
lo JJRk
r k(x(s),a)m(dads) = 0,
422 15. Problems from the Calculus of Variations: Infinite Time Horizon

which contradicts (A3.2). Thus, there are T1 < oo and c > 0 satisfying the
conditions of the claim. The conclusion of the lemma now follows if we take
T = T1Mjc .•

15.3.3 A shape from shading example


In this section we will consider numerical schemes for a shape from shad-
ing problem. For the sake of brevity we will consider only one particular
setup and refer the reader to (46, 124] for generalizations. We also refer
the reader to (72] for questions regarding terminology and modelling. In
our approach to the shape-from-shading problem, we will use a calculus of
variations problem whose minimal cost function describes the surface to be
reconstructed. Most of this section is taken from [46]. A related approach
appears in [133], although the assumptions are quite different from those
used here.
The formulation of our particular shape from shading problem is as fol-
lows. Consider a surface in JR3 given in the explicit form S = {(xb x 2, x 3 ) :
xa = f(xl, x2)}. The function/(·) appearing in the description of Sis called
the height function, and we will write x for (xi. x2). Suppose that the sur-
face is illuminated from the positive x 3 direction by a point light source
that is assumed infinitely far away, and that the reflected light is recorded
in an imaging plane that is parallel to the plane {(x1. x2, xa) : xa = 0}.
Assume that the "recorded light" is characterized in terms of a determin-
istic intensity function I(x), where x identifies the (xi. x2) coordinates of
a point on the imaging plane. Under a number of additional assumptions,
including the assumption that the surface is "Lambertian," the surface S
and the intensity function I(·) are related by the equation
-1/2
I(x) = (
1 + lfx(x)l 2 ) (3.14)

in regions where /(-) is continuously differentiable. Thus, I(x) equals the


absolute value of the x 3 -component of any unit vector orthogonal to S
at (x, f(x)). We also have I(x) E (0, 1]. We will refer the points where
I(x) = 1, which obviously includes all local maximum and minimum points
of f (·), as the singular points of I ( ·).
In the case of a single local minimum point, I(·) determines the surface
up to a vertical translation. However, in the general case, (3.14) does not
determine the function f (·) even with this sense of uniqueness. The assump-
tion we will use is that the height function is known at all local minima.
Because the minimum points are distinctive, it is likely that their heights
and the local nature of the surface could be determined by some other
method (e.g., stereo). An alternative to a priori knowledge of the heights
of the local minima that is currently under investigation is based on recon-
structing the surface with given (possibly incorrect) values assigned to the
heights of the minima and then estimating the correct relative difference
15.3 Numerical Schemes for the Case k(x,o:) ~ 0 423

between the heights based on this reconstruction. We will also assume that
an upper bound B is available for f(·) on G. The set G is given a priori,
and represents the subset of the imaging plane on which data is recorded.
G is often larger than the domain on which the reconstruction off(·) is
desired. We say that a set A c JR2 is smoothly connected if given any two
points x and y in A, there is an absolutely continuous path ¢ : [0, 1] -+ A
such that ¢(0) = x and ¢(1) = y.
A3.3. Let H c G be a compact set that is the closure of its interior,
and assume H is of the form H = nf=
1 Hj, J < oo, where each Hj has a
continuously differentiable boundary. Assume that fx is continuous on the
closure of G, and that K = {x: I(x) = 1} consists of a finite collection of
disjoint, compact, smoothly connected sets. Let L be the set of local minima
off(·) inside H, and define nj(x) to be the inward (with respect to H)
normal to 8Hj at x. Assume that the value off(·) is known at all points
in L, and that fx(x)'nj(x) < 0 for all x E 8H n 8Hj,j = 1, ... , J.
Remarks. It turns out that the minimizing trajectories for the calculus
of variations problem to be given in Theorem 3.6 below are essentially the
two dimensional projections of the paths of steepest descent on the surface
represented by the height function. Thus, the assumptions that are placed
on H in (A3.3) guarantee that any minimizing trajectory that starts in H
stays in H. Theorem 3.6 shows that the height function has a representation
as the minimal cost function of a calculus of variations problem that is
correct for all points in the union of all sets H satisfying (A3.3). If we
consider an initial point x E G such that the minimizing trajectory exits
G, then we cannot construct f (·) at x by using the calculus of variations
representation because this reconstruction would require I(x) for values of
x outside G. If we assume that the height function is specified at the local
maximum points, then we can consider an analogous calculus of variations
problem with a maximization.
The following theorem is proved in [124].
Theorem 3.6. Assume (A3.3), and for x E JR2 and o: E JR2 define
for x E L
g(x) = { ~(x) for x <j. L
and
1 1( 1
k(x,o:)=2lo:l 2 +2 J(x) 2 -1
) 1 21
=2lo:l +21fx(x)l 2
·
Define
V(x) =inf [1pAT k(¢(s),¢(s))ds+g(¢(p!\r))],
where T = inf {t : ¢( t) E 8G u L} and the infimum is over all p > 0 and
absolutely continuous functions¢: [0, p] -+ G that satisfy ¢(0) = x. Then
V(x) = f(x)
424 15. Problems from the Calculus of Variations: Infinite Time Horizon

for all x E H.

Remark. If g(x) is set to a value that is less than f(x) for some x E H,
then, in general, we will not have V(x) = f(x) for x E H. For example, if y
is any point at which J(y) = 1 and if g(y) < f(y), then V(y) ~ g(y) < f(y).

We next present a numerical procedure for solving for V(x). One feature
of this problem that is quite different from those considered previously is the
nature of the target set. For example, consider the case when L = {x0 } for
some Xo E G0. The target set is then aa u {Xo}' and (ALl) does not apply.
The interior cone condition holds, but the exterior cone condition fails. The
exterior cone condition was used in the proof of the upper bound for all
convergence theorems that have been presented so far in this chapter. In
those proofs, if an optimal (orE-optimal) path¢(·) terminated on a target
set 8G at time p, then the exterior cone condition was used to define ¢( ·)
on a small interval (p, p + v], v > 0, in such a way that the added cost
was arbitrarily small and ¢(t) rf. G for t E (p, p + v]. This ¢(·) was then
used to define a control scheme for the chain, and because ¢( ·) had been
constructed in this way, the exit times of the chain converged to p. See, for
example, Subsection 14.2.3. If the target set does not satisfy an exterior
cone condition, then this construction is no longer possible. Target sets such
as an isolated point are typically difficult to deal with when proving the
convergence of numerical schemes. A common technique is to replace the
target set A by N'Y(A), 'Y > 0, prove an appropriate convergence property
for the problem with this target set, and then send "f -+ 0. We will show in
this subsection that this "fattening" of the target set is not needed when a
mild additional condition on the chain is assumed to hold.
Let Vr(x) denote the optimal cost if the controlled stopping time is
restricted to the range [0, T]. Our assumption that B is an upper bound
for /(·) on G implies that it is never optimal to stop at a point in G- L
whenever T is sufficiently large. The stopping cost for points in G - L was
actually introduced in the definition of the calculus of variations problem
of Theorem 3.6 solely for the purpose of forcing optimal trajectories to
terminate in L. This use of a stopping cost could be avoided altogether if
the minimization in the calculus of variations problem were only over paths
that terminate in L at some finite time. However, this added constraint
would be rather difficult to implement in the numerical approximations.
We will see below that because g( ·) is the proper stopping cost to introduce
to force the trajectories to terminate in L, it also provides the proper initial
condition for the numerical scheme.
Because the target set can possibly contain isolated points that may not
be included in Gh, we really need to introduce a "discretized target set"
Lh C Gh, and redefine g( ·) in the obvious way. We would need that Lh -+ L
in the Hausdorff metric [i.e., d(x, L) ~ Eh for all x E Lh, d(x, Lh) ~ Eh for
all x E L, and fh -+ OJ. To simplify the notation we will just assume
15.3 Numerical Schemes for the Case k(x,o);::: 0 425

LcGh.
Unlike the general situation of Subsection 15.3.1, for this problem we can
approximate the finite time problem VT(x) and then send T ---too and the
discretization parameters to their limits in any way we choose. This allows
the use of the cost k(·, ·),rather than k"'(h)(·, ·),in the following alternative
to the method of Subsection 15.3.1.

A Numerical Scheme. Define V0h(x) = g(x), and recursively define

V,!'+l (x) ~min [g(x), .~ [~,l'(x, vla)V,!'(y) + k(x, a)ll.t'(a)]]


{3.15)
if x E G~ and Vn\ 1 (x) = g(x) for x ¢ G~. Finally, define

Vh(x) = lim v:(x). {3.16)


n-+oo

The existence of the limit in the definition of Vh(x) follows from the
monotonicity v:+l (x) ~ v:(x). Note that we do not specify Vh as the
solution to a dynamic programming equation. This would not be correct,
since the only possible dynamic programming equation has multiple so-
lutions. Instead, Vh is defined as the limit of an iterative scheme with a
particular initial condition, which is chosen to pick out the "right" solution.

Remark. The iteration {3.15) is of "Jacobi" type (see Chapter 6). Conver-
gence can also be demonstrated when (3.15) is replaced by a "Gauss-Seidel"
type of iteration [46].

A Simple Recursive Equation. In Subsection 15.2.2 it was shown that


the dynamic programming equation takes a particularly simple form when
the transition probabilities of Example 2.1 are used and if the running cost
is quadratic. Applying the results of that subsection to the right hand side
of (3.15), one obtains the following explicit formula. For simplicity, we omit
the intermediate calculations and present only the conclusions. For x E G~,
let Vt and v2 be the smallest values from the sets

{Vn"(x + h{1, 0)), Vn"(x- h{1, 0))} and {Vnh(x + h(O, 1)), Vn"(x- h(O, 1))},
respectively. Define m = {1/I2 (x))- 1. If 0 ~ h 2 m < (v 1 - v2 ) 2 , then we
use the recursion

v:+l(x) =min [g(x), (Vtl\ V2) + hm 112].


If h2m ~ (v1- v2) 2 , then we use

v:+l (x) =min [g(x), ~ [ (2h2m- (Vt - V2) 2 ) l/ 2 + {Vt + V2) J] .


426 15. Problems from the Calculus of Variations: Infinite Time Horizon

On Weakening (Al.l). As mentioned previously, the target set need


not satisfy an exterior cone condition. This means that in order to prove
convergence, we need an additional assumption beyond local consistency.
The additional condition is needed only in the proof of the upper bound
limsuph--tO Vh(x)~ V(x). Basically, all that is needed is a "controllability"
condition on the Markov chain. Consider the particular case where the
target set is a single point. The idea is to impose conditions on the chain
which will guarantee that if we can get the chain "close" to the target set
by a control scheme with nearly optimal running cost, then a control with
small running cost can be found that will finish the job of driving the chain
into the target set (at least with a high probability).
Consider the transition probabilities and interpolation interval of Ex-
ample 14.2.1. Let e > 0 be given and assume {A1.2). Then there exists
ho > 0 and 'Y > 0 with the following properties. For all h < ho and for
any pair of points x E Gh, y E Gh satisfying lx- Yl < "(, there exists a
nonanticipative control scheme {uf, i < oo} with the following properties.
If { ~f, i < oo} is the controlled chain that starts at x at time zero and if
Nh ~ inf{i: ~f = y}, then
1. l~f - Yl ~ e for all i = 0, ... , Nh - 1,

2. L~o- 1 ~th(uf) ~ e,

3. L~o- 1 k(~f,uf)~th(uf) ~ e
w. p.l. Hence, if the proof of the upper bound can be given when the target
set is of the form N.-y{x 0 ) (i.e., a set that satisfies the exterior cone con-
dition), then it can be given for target sets of the form {xo} as well. The
details will be given below. A formulation of conditions on the Markov chain
that includes Example 14.2.1 and the chains derived from this example via
{14.2.3) and {14.2.4) is as follows.
A3.4. Given e > 0 there exist h0 > 0, 'Y > 0, and M < oo with the following
properties. Given any h < h 0 and any x, y E Gh such that lx- Yl < "(,
there exists a nonanticipative control scheme {uf, i < oo} satisfying iuf I ~
M,i < oo, with the following properties. If {~f,i < oo} is the resulting
controlled chain that starts at x and if N h = inf{i : ~f = y}, then

Our convergence theorem can finally be stated.


Theorem 3.7. Assume {A3.3) and {A3.4). Define Vh(x) by {3.15) and
{3.16), where the running cost and terminal cost are as given in Theorem
3.6. Then
15.3 Numerical Schemes for the Case k(x, a) ~ 0 427

Remark. Recall that L was defined in (A3.3) as the set of local minima
inside H. The points {x : I (x) = 1, x E G - L} are the singular points that
are not local minima inside H. On these points we have k(x, 0) = 0, and
thus the assumptions used in Subsection 15.2.1 do not apply.

Proof of the Upper Bound. The proof of the upper bound follows the
lines of the finite time problem of Theorem 14.2.5 except for the difficulties
related to the nature of the target set. Fix x E G- L. If V(x) = B,
there is nothing to prove. Assume V(x) < B, and let E > 0 be given such
that V(x) + 2E < B. Recall that Vr(x) is the minimal cost subject to the
restriction p E [0, T]. Since Vr(x) ..j,. V(x), there exists T < oo such that
Vr(x) :::; V(x) + E <B. If m(·) is an E-optimal relaxed control for Vr(x)
and if x( ·) is the associated solution, then x( T) E L for some T ::=:; T, and

r r k(x(s),a)m(dads) + g(x(T)):::; Vr(x) +f.


lo J1R2
[Note that it is not optimal to stop with this given control before x(·) enters
L.]
Select 1 > 0 for the given E according to (A3.4). By Theorem 14.2.2, we
may assume the existence a finite set U"Y; 2 C JRk, 8 > 0, and an ordinary
control u"YI 2 (·) with the following properties. u"YI 2(-) takes values in u"Y/2• is
constant on intervals of the form [j8,j8 +8), and if x"'~l 2 (-) is the associated
solution, then
sup jx"'~l 2 (t)- x(t)\:::; 1/2
O~t~r

and

sup I t Jr
o~t~r Jo JR2
k(x(s),a)ms(da)ds- t k(x"YI (s),u"YI (s))ds/:::;
Jo
2 2 f.

We now define a control scheme for the Markov chain. We will use u"YI 2 (·)
to define the scheme until the interpolated time reaches T. If the chain has
not yet entered the target set by this time, then we may have to extend
the definition for times after T via (A3.4).
In order to apply u"'~l 2 (-) to the chain {ef,i
< oo}, we recursively define
the control applied at discrete time i by u~ = u"YI 2 (t~) and t~ 1 = t~ +
er'
~th (u7). This defines a control until i such that t~+ 1 ~ T. Let { i < 00}
be the chain that starts at x and uses this control.
Define
Nh = inf{i: t? ~Tor erELor er
¢CO},
and let Th = t'Jvh. By Theorem 14.2.4, we have sup 0 ~t~r, \eh(t) -x"'~l 2 (t)\-+
0 in probability, and P{\et - x(T)\ ~ 1} -+ 0. Assume h > 0 is small
enough that P{\e'N"- x(T)\ ~ 1}:::; E.
428 15. Problems from the Calculus of Variations: Infinite Time Horizon

On the set where l~irh -x(r)l < -y, we extend the definition of the control
sequence for discrete times larger than Nh in such a way that (A3.4) is
satisfied. We then stop the process at the discrete time

Mh = inf{i ~ Nh: t~- tirh ~ t: or ~f E Lor ~f ¢ G0 }.

Note that if ~_t.n E L, then l~tn - x(r)l ~ f by (A3.4), at least with


probability close to one. Recall that g(x) = f(x) for x E L. On the set
where l~irh -x(r)l ~ -y, we stop at Nh and pay the stopping cost. The total
cost is then bounded above by

EJ:h rh!.
Jo n2
k(~h(s),a)mh(dads)
+P{i~irh -x(r)i ~ -y}B+t: sup k(y,a)
yEG,Ials;M
+P{i~irh -x(r)l <-y,~_t.h EL, and l~th -x(r)i ~t:} sup f(z)
lz-:z:(T)Is;E
+P{i~irh -x(r)l <-y,~_t.h ¢L, or l~th -x(r)i >t:}B.

This last sum is itself bounded above by

+ sup f(z) + 2t:B.


lz-:z:(T)Is;E

Sending h-+ 0, we obtain

lim sup Vh(x)


h-to
~ r {
Jo Jn2
k(x(s), u'"YI 2 (s))ds
+t: sup k(y,a)+ sup f(z)+2t:B.
yEG,Ials;M lz-:z:(T)Is;e

Sending f -+ 0 gives
lim sup Vh(x) ~ VT(x).
h-tO
Since T < oo is arbitrary,
lim sup Vh(x) ~ V(x).
h-+0

Proof of the Lower Bound. Fix x E G - L and f > 0. According to


(3.15) and (3.16), for each h > 0 there is n < oo such that

(3.17)

For the rest of the proof we will assume that n has been chosen such that
(3.17) holds. Owing to the definition of v;(x), there is a controlled Markov
15.3 Numerical Schemes for the Case k(x, a) :;:::: 0 429

chain {~f,i < oo} with control sequence {uf,i < oo} that satisfies ~8 = x,
and a finite stopping time N h such that

(Nhi\Mh)-l
v:(x)2:E;h L k(~J,uJ)ath(uJ)+E;hg(~R,hi\Mh)-e, {3.18}
j=O

where Mh is the time of first exit from CO or entrance into the set L. The
stopping time N h is the minimum of n and the controlled stopping time.
Although the chain, control sequence, and stopping times depend on n, we
will not indicate the dependence in the notation. Let ~h(·) and uh(-) be
the continuous parameter interpolations of {~f, i < oo} and {uf, i < oo},
respectively, and let
Nh-l Mh-l
Ph= L ath(uf), Th = L ath(uf).
i=O i=O

We can then rewrite (3.18} as

v:(x) 2: E';h 1 1
0
phi\'Th
B,2
k(~h(s), a)mh(dads) + E:'h g(~h(Ph A Th))- €,
(3.19}
where mhO is the relaxed control representation of the ordinary control
uh(·).
Let Kq, q = 1, ... , Q be disjoint compact connected sets such that K =
U~= 1 Kq. The existence of such a decomposition has been assumed in the
statement of Theorem 3.7. Now V(x) is constant on each Kq, so there exists
'Y > 0 such that

x E Kq, y E Ny{Kq} :::} jV(x)- V(y)i ~ f (3.20)

and such that the sets N"Y(Kq) are separated by a distance greater than 'Y
for distinct q. Because the reflected light intensity I(·) is continuous, there
is c > 0 such that

For simplicity, we will consider the proof of the lower bound for the
case when the initial condition satisfies x E N"Y; 2{Kq} for some q. The
general case follows easily using the same arguments. We define a sequence
of stopping times by

T~ = 0,
aJ inf{t 2: rf: ~h(t) f/. U~=lN"Y{Kq}},
rf inf{t 2: aJ-1 :eh(t) E u~=lN"Y/2(Kq) or eh(t) f/. CO}.
430 15. Problems from the Calculus of Variations: Infinite Time Horizon

Consider the processes

sho =(~go,~~(-), ... ), Mh(·) = (m~O,m1o, ... ),

where ~j(·) = ~h(· + aj) and where mj(-) is the relaxed control represen-
tation of the ordinary control uh(· + aj). We consider (2h(·), Mh(-)) as
taking values in the space

endowed with the usual product space topology. Owing to its definition,
v;(x) is uniformly bounded from above. Thus Theorem 14.2.4 shows that
given any subsequence of {(Sh(·), Mh(·)), h > 0}, we can extract a further
subsequence that converges weakly, and that any limit point

(X(·), M(·)) = ((xo(·), x1(·), ... ),(moO, m1(·), ... ))


of such a convergent subsequence satisfies

where each m 3( ·) is an admissible relaxed control. In addition, the definition


of the stopping times {aj} guarantees that xo(O) E {)N.y(x) and that for
all j > 0, either x 3(0) E 8Ny(Kq) for some q or x 3(o) rf. G0 .
Let Jh = min{j : rf
2': rh}, where Th has been defined to be the interpo-
lated time at which ~h(·) first exited G0 or entered L. It then follows from
the uniform bound from below given in (3.21) and (3.19) that

(3.22)

Define sj = Tjh+l - aj and sh = (sa' s~' ... ). It also follows from (3.21) that
there exists c > 0 such that for all q1 and q2,

(3.23)
[e.g. c =(2c) 112 'Y]·
We now prove the lower bound lim infh-to Vh(x) 2': V(x). Extract a sub-
sequence along which

converges to a limit

(x(·), m(-), p, r, 2(·), M(·), J, S).


15.3 Numerical Schemes for the Case k(x,a) ~ 0 431

We assume via the Skorokhod representation that the convergence is w.p.l,


and consider any w for which there is convergence. If p < r, we have

Next assume r :::; p. We have

By using the definition of V (·) and an elementary dynamic programming


argument, for each j < J we have

By (3.22), the s; for j < J are finite w.p.l, and by (3.23) we can assume
that J is uniformly bounded from above. By construction, if j < J -1 and if
x;(s;) E N7 ; 2 (Kq) (which it must be for some q), then x;H{O) E 8N7 (Kq)·
Recall that L C U~= 1 Kq and that L is the set of local minimum points. It
follows from the definitions of T;h and aj that if Th <Ph and eh(rh) E Kq
for some q, then eh(rjh_ 1) E N7 ;2(Kq) for that same q. On the other hand,
if Th <Ph and {h(rh) ¢ Kq for any q, then g({h(Ph 1\ rh)) =B. Therefore,
in general,
liminf u<eh(Ph 1\ rh)) ~ g(xJ-1(sJ-1))- f (3.25)
h--?0

w.p.l. Now consider the paths x;(·),j < J < oo. Clearly, xo(O) E N 7 (x),
and by (3.20) we have IV(x;-1(s;-d)- V(x;(O))i::::; 2t:. By combining this
with (3.24) and (3.25), we obtain

liminf rhATh { k(eh(s), o:)mh(do:ds) + g(eh(Ph 1\ Th))


h--+O 1o 1JR2
~ L 1orj 1r
OS.i<J JR2
k(x;(s),o:)m;(do:ds) + g(XJ-1(BJ-1))- f
~ L [V(x;(O))- V(x;(s;))] + V(xJ_ 1(sJ_ 1))- f
O$.j<J
2 V(x) + L [V(x;(O))- V(x;-1(s;-1))]- 2t:
O<j<J
2 V(x)- 2Jt:
432 15. Problems from the Calculus of Variations: Infinite Time Horizon

w.p.l. Using (3.17) and (3.18), we have

liminfVh(x) ~ V(x)- 2[J + 1]E.


h-+0

We conclude by sending E --+ 0 and using the fact that J is uniformly


bounded from above. •

Computational Results and Examples. Figure 15.1 displays a surface


that has been sampled at a grid of 128 x 128 points.

Figure 15.1. Original surface.


15.3 Numerical Schemes for the Case k(x, o:) ;::: 0 433

Figure 15.2. Reconstructed surface.

Figure 15.3. Mannequin image.


434 15. Problems from the Calculus of Variations: Infinite Time Horizon

Figure 15.4 . Re ·onstructed surface.

Figure 15.5. Illuminated reconstruction.


15.4 Remarks on Implementation and Examples 435

Figure 15.2 shows the reconstruction provided by the Jacobi algorithm


after 150 iterations. The maximum value of the height function is 20, and
the reconstructed image is within 0.5 of the true image. In producing Figure
15.2, we have used an analytic expression for the surface to compute I(·)
at the grid points. In order to estimate the number of iterations required
for the algorithm itself to converge, we computed discrete approximations
to the derivatives on the basis of the surface sampled only at the grid
points, and then used these values to compute I(·) at the grid points.
With these values of I(·), the reconstruction should be equal to the original
surface modulo only errors due to machine accuracy. Thus, we are able to
separate errors due to the numerical method from those due to stopping
the algorithm (3.15) after too few iterations. Given these values for I(·)
and starting with a large initial condition, the algorithm converged to a
solution correct to within w- 6 after 150 iterations of Jacobi type. By using
the Gauss-Seidel version of the algorithm presented here and varying the
ordering of the states with each iteration, convergence was observed after
11 iterations.
In Figures 15.3-15.5, we consider the application of the algorithm to a
real 200 by 200 image. Figure 15.1 gives the original image, which is a pic-
ture of the head of a mannequin. The head was actually illuminated from
an oblique direction, and the algorithm used in the reconstruction was the
oblique direction version of the algorithm presented in this subsection [124].
For the reconstruction, the set L was taken to be the position of the tip of
the nose, and the version of the algorithm that allows data to be given at
local maxima was used. Figure 15.4 shows the surface reconstruction ob-
tained after using 6 Gauss-Seidel iterations, and Figure 15.5 shows how this
reconstruction would appear if illuminated by light coming from the same
direction as in Figure 15.3. The Jacobi algorithm required 160 iterations
to converge.
For further examples and detail as well as a comparison of the algorithm
with other proposed schemes, we refer the reader to [46, 124].

15.4 Remarks on the Implementation of the


Algorithms and Illustrative Numerical
Examples
In this section we comment on the implementation of the algorithms de-
scribed in the previous two sections. We have delayed our remarks until
this point because it is useful to have illustrative examples at hand. The
particular examples that have been programmed by the authors all involve
cost functions for which the minimization in (2.1) can be done analytically,
such as the case of quadratic cost that is discussed in Subsection 15.2.2. In
all cases, the approximation in value space method was used (see Chapter
436 15. Problems from the Calculus of Variations: Infinite Time Horizon

6). For many control problems, the approximation in policy space method
has significant advantages over approximation' in value space. However,
these advantages dissappear when approximating the solution to deter-
ministic control problems, because of the nature of the controlled process
~(t) = u(t). While the approximating processes ~h(·) are not determinis-
tic, the fact that they are "nearly" deterministic means that information
contained in the boundary or stopping cost is passed back into the interior
in the form of a propagating "front," with points between the boundary
and the front essentially equal to their fixed point values, and with the
values at points on the other side of the front having essentially no effect
on the "solved" grid points. The position of the ''front" moves only when
the policy is updated, and so the extra iterations {under a fixed policy)
required by iteration in policy space are of little benefit.

On the Construction of Fast Iterative Schemes. One can interpret


the iterative schemes used to solve to the discretized dynamic programming
equations as functionals of a controlled Markov chain. This interpretation
is very useful in understanding the behavior of these iterative solvers, and
in particular for the construction of rapid and efficient solvers. To simplify
the discussion, let us consider the problem without a controlled stopping
time, so that {2.1) is replaced by

Vh(x) = min ["' ph(x, yia)Vh(y)


aERk L..J
+ k(x, a)~th(a)] (4.1)
y

if x E G~, and Vh(x) = g(x) if x ¢ G~. Suppose one were to use the
approximation in value space method to solve {4.1). Let the iterates be
denoted by V;_h(x), and for some given function!(·), let the initial condition
be V0h(x) = f(x). Then ~h(x) can be interpreted as the cost function of a
controlled discrete time Markov chain that starts at x at time 0 and has
transition probabilities ph(x, yia), running cost k(x, a)~th(a), stopping set
aa~. stopping cost g(·), and terminal cost (assigned at time i if the chain
has not yet entered aG~) of!(·). See the discussion in Subsection 6.2.2.
For the calculus of variations problems considered in this chapter the
only given data are the values of V(x) on aG, which must be propagated
back into G0 along the minimizing paths. The situation is similar with
the discrete approximation V;_h(x). The only given data are the values of
V;_h(x) at the points X E aah. This information is "passed along" to the
points x E G~ in the iterative scheme used to solve {4.1) by the controlled
Markov chain. The information is passed along most quickly if the chain
[while evolving under the optimal policy] reaches aGh most quickly. Note
that a large value of f will enhance the movement of the chain toward the
boundary. We will discuss this aspect further below.
In constructing chains that will propagate the data most quickly, the
flexibility in the choice of approximating chain can be used to great ad-
15.4 Remarks on Implementation and Examples 437

vantage. In fact, assuming that the calculations required to evaluate (4.1)


are not too onerous, the chain of Example 14.2.1 is the natural choice.
Two properties of this chain make it particularly efficient. The first is that
it is "one-sided", i.e., given any coordinate vector ei, if (ei, a) ~ 0 then
(ei, y - x) ~ 0 for all y in the support of ph(x, ·Ia). To understand the
behavior of a chain without this property, consider an uncontrolled chain
whose transition function is given by (1-h)ph(x, y) +hqh(x, y), where ph is
one-sided and qh is the transition function of a standard random walk on
h7Ln. With a small modification of the interpolation interval t::..th the local
consistency conditions remain valid. However, the presence of the random
walk component perturbs the behavior, acting like a small second order
term in the PDE and like an approximation to a Wiener process in the
stochastic control problem. Although the effect of this term disappears in
the limit h ---+ 0, it can significantly degrade the performance of the algo-
rithm. In particular, the rate of convergence of the iterative schemes used
to solve equations such as (4.1) is much slower. For example, the spectral
radius of the matrix {ph(x,yla),x E Gh,Y E Gh} appearing in (4.1) for
each fixed control under which the cost is finite is often zero if the chain is
one-sided, but nonzero otherwise.
The second property is ph(x, xla) = 0 for all x and a i= 0. This corre-
sponds to making t:J..th (a) large. Recalling the interpretation of the iterative
scheme as the solution of a finite stage, discrete time problem, it is clear
that this property is needed if the iterative solver is to converge rapidly,
and that it will be especially important near points where the minimizing
trajectory has velocity near zero. At such points each iteration of the dis-
crete algorithm corresponds to a very large continuous time interval in the
interpolated process. If a chain with constant time interpolation is used
instead then the number of iterations required for convergence tends to
infinity as the velocity tends to zero.
Since the transition probabilities of Example 14.2.1 move the chain off the
current grid point w.p.1 and along the direction of the optimal trajectory,
one would expect the Jacobi algorithm to converge in a number of steps that
is inversely proportional to h, and Gauss-Seidel to converge in a number
of steps that is essentially independent of h. This behavior holds in the
examples presented below, and in fact has been observed in every problem
to which the authors have applied these algorithms.

On the Role of the Initial Condition for the Iterative Solver. As


long as there is a positive lower bound on the running cost, the dynam-
ics programming equation for the approximating chain will have a unique
solution. However, while there may be convergence to the unique solution
regardless of the initial condition chosen for the algorithm, the speed of
convergence depends heavily on this initial condition. To see why this is
so, recall once more the interpretation of V;_h as the minimal cost for a i-
step optimal control problem with terminal cost VQh. Viewed as a terminal
438 15. Problems from the Calculus of Variations: Infinite Time Horizon

cost, a large value of V0h encourages the optimal control to move the chain
towards the boundary (in order to avoid this cost). Thus, the boundary
data can be learned and propagated back into the interior quickly. On the
other hand, a small value of V0h gives no incentive for the chain to seek the
boundary. The accumulated running cost eventually directs the process to-
wards the boundary, but it may take a large number of iterations to do
so and the convergence will be especially slow if the running cost can be
near zero. The extreme case of a running cost that is not bounded away
from zero is considered in Subsection 15.3.3. Here the choice of a proper
initial condition is critical, and owing to nonuniqueness the specification of
the initial condition is an essential part of the algorithm formulation. With
this discussion in mind, we use large initial conditions for all the following
examples.

Numerical Examples. In this section we present approximations ob-


tained using the algorithm described in Subsection 15.2.2. Even though
there is no controlled stopping in the problems we will present, when com-
puting we use an algorithm that would be appropriate for controlled stop-
ping with stopping cost B. Thus in the Jacobi iteration Vj\
1 (x) is given
by

i~f [ L ph(x, y/a)Vjh(y) + k(x, a)~th(a)] 1\ B,


yEGh

and with the obvious modification for the Gauss-Seidel algorithm. Here
B is any upper bound for V(x). The additional minimization is needed to
guarantee that Vjh(x) is monotonically nonincreasing in j. Since B is larger
than V(·) on G, this control does not alter the value of V(x), since it is
never invoked by any optimal trajectory.
The ordering of the states is relevant for the performance of the Gauss-
Seidel iteration (see Subsection 6.2.4). In general, the states should be
ordered in such a way that the iteration goes against the tendency of the
flow as much as possible. However, since this is not known a priori, it is
best to alternate between several reasonable orderings of the state variables.
For example, for problems in two dimensions the iteration is performed by
alternating between the following four ordering of states: (i) from top to
bottom, and within each row from left to right; (ii) from bottom to top, and
within each row from right to left; (iii) from left to right, and within each
column from top to bottom; and (iv) from right to left, and within each
column from bottom to top. The ordering for three dimensional problems
is done similarly.
In all the tables, we use m to denote 1/h. Of particular note is the
independence (with respect to m) of the number of iterations required for
convergence by the Gauss-Seidel algorithm.
15.4 Remarks on Implementation and Examples 439

Figure 15.6. Approximated Value Function for (4.2) with m = 30.

Example 1: Our first example considers a minimum escape time problem,


which is a nondegenerate problem with zero drift. We consider escape from
an open set G c JRk.
As discussed earlier, there are different representations for the value func-
tion for this problem. On the one hand, one can take the running cost to
be identically equal to 1, in which case the constraint on the control space
gives a complicated description of the dynamics. On the other hand, one
can consider a quadratic running cost of the form

1 2
k(x,o:) = 4lo:l + 1,

in which case the dynamics are simply ¢ = u. We adopt the latter represen-
tation since it is the one that is best suited for the numerical approximations
described in Subsection 15.2.2.
We first analyze a two-dimensional problem on the set

The value function for this problem is defined over 5 different regions as
shown in Figure 15. 7. The derivative of the value function has disconti-
nuities over the boundaries that separate these regions, resulting in sharp
edges in the graph of the value function. As can be seen in Figure 15.6, the
numerical approximation preserves the sharp corners of the figure. Nev-
ertheless, the error in the approximation is highest at points where these
sharp edges occur (Figure 15.8).
440 15. Problems from the Calculus of Variations: Infinite Time Horizon

X2

Figure 15.7. True controls. Figure 15.8: Errors.

m I Iterations I Maximum error


5 8 0.6739E-01
10 14 0.6066E-01
15 20 0.3821E-01
20 26 0.3160E-01
25 32 0.2828E-Ol
30 38 0.2246E-01

Table 15.1: Minimum Distance for the set {4.2)


Jacobi Iteration

The approximation results are provided in Table 15.1. The leftmost col-
umn corresponds to the number of grid points on each half-axis. The same
maximum errors were obtained with a Gauss-Seidel procedure with only
5 iterations irrespective of m. The iterative scheme was applied until the
maximum difference between successive iterates was less than .001. The
same stopping criterion is also used for the other examples discussed in
this section.

Example 2: Let G be the open unit cube on JR3 . Consider the running
cost

where

b = ( -2, -2, -4)' and


15.4 Remarks on Implementation and Examples 441

and exit cost

g(xt. 1,xa)
g(x1,-1,xa)
g(x1,x2, 1)

Results for a Gauss-Seidel iteration are given in Table 15.2. The maximum
error is obtained by comparing the approximation with the true solution
given by

m Iterations Maximum error


5 8 .384591E+OO
10 8 .216769E+OO
15 8 .150467E+OO
20 8 .115211E+OO
25 8 .933445E-01
30 8 .784566E-01
35 8 .676655E-01
Table 15.2: Example 2
Gauss-Seidel Iteration
Example 3: Our last example a concerns a variational problem which
arises when considering the large deviation properties of a diffusion approx-
imation to a phase locked loop model (45]. Strictly speaking, this example
is not covered by Subsection 15.2.2 since k is not finite at all points. It is,
however, a straightforward limiting case, and details can be found in [14].
On the unit square in JR?, consider the running cost

if a2 = -1rx1
otherwise,

with
b1(x) = -2nx1 +.Bsinnx2
and 'Y > 0. The "n" scaling above is convenient so that the set of interest
can be taken to be the unit square.
Table 15.3 gives the results for a Gauss-Seidel approximation with 'Y =
0.001 and .B = 1. In the table, we also record the successive differences for
the approximations as a function of m in the rightmost column.
442 15. Problems from the Calculus of Variations: Infinite Time Horizon

Figure 15.9. Approximated Value Function for f3 = 1 (m = 60).

I m I Iterations I V(O,O) I Successive Differences I


60 10 0.25635858E+Ol 0.00000000
70 10 0.25620889E+Ol 0.14968709E-02
80 10 0.256094 76E+01 0.11412458E-02
90 10 0.25600438E+01 0.90388511E-03
I
100 10 0.25592926E+01 0. 75118958E-03
110 10 0.25586792E+Ol 0.61332968E-03
120 10 0.25581709E+Ol 0.50837769E-03
130 10 0.25577435E+Ol 0.42738899E-03
140 10 0.25573828E+01 0.36069404E-03

Table 15.3: Example 3 with f3 = 1, 'Y = .001


Gauss-Seidel Iteration
16
The Viscosity Solution Approach to
Proving Convergence of Numerical
Schemes

In Chapters 10 to 15, we have shown the convergence of properly designed


numerical approximations for a wide range of stochastic and deterministic
optimal control problems. The approach to proving the convergence has
been based on demonstrating the convergence of a sequence of controlled
Markov chains to a controlled process (diffusion, jump diffusion, etc.) ap-
propriate to the given stochastic or deterministic optimal control problem.
In this chapter, we will very briefly describe an alternative approach for
proving the convergence of numerical schemes. The approach is based on
what is referred to as the ''viscosity solution" method (due to Crandall and
Lions [31]) of defining and characterizing solutions to a wide class of par-
tial differential equations. In particular, this method is applicable to many
equations for which there are no classical sense solutions, a situation that
is common for the PDE that are associated to stochastic and determinis-
tic optimal control problems. The notion of solution allows an alternative
method for proving the convergence of schemes for certain types of prob-
lems. In general, all the development of Chapters 4 to 8 that is concerned
with deriving and solving numerical schemes applies here as well.
The application of viscosity solution methods to proving convergence of
numerical approximations is currently a very active research area. Conse-
quently, we will not give a detailed exposition of the methodology in its
most general form. Rather, we will try to describe some of the basic ideas
involved in using the method, and indicate the appropriate literature for
the reader interested in learning more. The approach we describe follows
Souganidis [140] and Barles and Souganidis [8].
For illustrative purposes, we will examine a control problem that involves
444 16. The Viscosity Solution Approach

a reflected diffusion process with a discounted running cost. Our example


includes and gives an alternative approach to some of the "heavy traffic"
problems discussed in Chapter 8. In Section 16.1, we give the definition of
viscosity solutions for the associated Bellman equation. This equation was
formally derived in Section 3.4. After stating the definition of the solution,
we discuss the existence and uniqueness of solutions, as well as the relation-
ship of the solution to the value function of the stochastic control problem.
Following this in Section 16.2 is a discussion of .the key assumptions that
are required, both of the solution to the equation and the numerical scheme
itself, in order that a convergence proof be applicable. These assumptions
are in some rough sense analogous to the main assumptions used by the
Markov chain method, and some remarks along these lines are included. We
conclude the chapter in Section 16.3 by exhibiting the proof of convergence
for our illustrative example.

16.1 Definitions and Some Properties of Viscosity


Solutions
We first describe the illustrative example. We consider a compact domain
G, which for simplicity we take to be a rectangle of the form {x : Ci ::; Xi ::;
di, i = 1, ... , k}, where Xi denotes the ith component of x E JRk and Ci < di.
As our controlled process, we consider a controlled reflected diffusion of the
type described in Section 1.4. Let Gi,j = 1, ... , 2k, be the sets of the form
{x : Xi ~ ci} and {x : Xi ::; di} fori = 1, ... , k. For each j = 1, ... , 2k, we
let ri : JRk ---+ JRk be a Lipschitz continuous function such that ri(x) =I= 0
for X E acj. Let nj be the inward normal to acj. Then we also assume
r1(x)'ni ~ 0 for all x E 8Gj· We define the set of directions r(·) by

In other words, r(x) is the intersection of the unit sphere with the closed
convex cone generated by r1 ( x) for all j such that x E oGi. Our model
then becomes

x(t) = x +lot b(x(s), u(s))ds +lot o-(x(s))dw(s) + z(t),

where z(·) satisfies the conditions of Definition 1.4.2.


For a cost criteria we use a discounted cost. For an admissible ordinary
control u (·), the cost is

W(x,u) =E; [looo e-!3tk(x(t),u(t))dt],


16.1 Definitions and Some Properties of Viscosity Solutions 445

where [3 > 0. The value function is then


V(x) = infW(x,u), (1.1)
where the infimum is over all admissible ordinary controls. Of course, con-
ditions must be given that will guarantee that this problem is well defined.
Precise conditions on b(·), u(·), ri(·), and k(·, ·)will be given below.
In Chapter 3 it was formally demonstrated that V (x) satisfies the fol-
lowing Bellman equation with reflecting boundary condition:
inf [.CaV(x)- [3V(x) + k(x, o:)] = 0,
{ aEU (1.2)
Vx(x)'r = 0 for r E r(x),x E 8G,
where for a twice continuously differentiable function!(·), we define
1
(.Ca J)(x) = f~(x)b(x, o:) + 2tr [fxx(x)a(x)],
and where a(·) = u(·)u'(·). In general, equations of this type need not
have any classical sense solutions (i.e., solutions which satisfy the equation
and boundary condition at all points in G). Thus, one is tempted to pose
a weaker notion of solution, e.g., to require satisfaction of the equation
and boundary condition save on a subset of G having Lebesgue measure
zero. One can show that the minimal cost function defined by (1.1) is a
solution of this type (under the conditions given below). However, with such
a definition, equations such as (1.2) may have many solutions. One of the
successes of the theory of viscosity solutions is that for a large class of such
equations it gives a notion of solution that is weak enough that solutions
will exist, and strong enough that uniqueness can also be guaranteed. It
also turns out [58, 114] that for many problems one can prove that the
value function for an optimal control problem is a viscosity solution for the
appropriate Bellman equation. In the presence of a uniqueness result, this
gives a useful alternative characterization of the value function.
The theory of viscosity solutions was first developed in the context of first
order nonlinear PDE [29, 31] and has since been extended in many direc-
tions, including fully nonlinear second order partial differential equations
[80, 114]. For a recent overview of the theory as well as a more complete list
of references than is given here, the reader may consult the survey paper
of Crandall, Ishii, and Lions [30].

Definition of Viscosity Solutions. There are several equivalent defini-


tions of viscosity solutions to (1.2), and the particular definition that is
used may depend on the intended application. We will use the following.
Let Sk denote the set of real valued symmetric k x k matrices. For v E JR,
x E G, p E JRk, and X E Sk, we define the function

F(x, v,p, X)= sup [-~tr [Xa(x)]- p'b(x, a)+ [3v- k(x, o:)] .
aEU 2
446 16. The Viscosity Solution Approach

[We have reversed the sign in (1.2) to follow a standard convention in


the literature.] Owing to the possible degeneracy of a(·), it may be the case
that the boundary conditions are not satisfied in any normal sense for parts
of 8G. In part, this motivates the following form in which the boundary
conditions are incorporated into the definition. We set

F*(x,v,p,X) = F.(x,v,p,X) = F(x,v,p,X) for X E G0 ,

F*(x,v,p,X) = F(x,v,p,X)Vmax{-r'p:rEr(x)}}
for X E 8G.
F.(x, v,p, X) = F(x, v,p, X) A min{ -r'p: r E r(x)}

Definition. An upper semicontinuous function V(·) on G is called a vis-


cosity subsolution if the following condition holds. If ¢( ·) E C 2 (G) and if
xo EGis a local maximum point of V(·)- ¢(·),then

F.(xo, V(xo), rf>x(xo), rPxx(xo)) ~ 0. (1.3)

Similarly, a lower semicontinuous function V (·) on G is called a viscosity


supersolution if whenever ¢( ·) E C 2 (G) and x 0 E G is a local minimum
point of V(-)- ¢(·},then

F*(xo, V(xo}, ¢x(xo), rPxx(xo)) 2: 0. (1.4)

A continuous function V( ·) is a viscosity solution if it is both a subsolution


and a supersolution.

Remarks. Note that in the definition given above subsolutions and super-
solutions are required to be only semicontinuous and not continuous. This
turns out to allow a considerable simplification in the proof of convergence
of schemes. The technique we will use originates in the papers of Barles
and Perthame [6, 7] and Ishii [77, 76].

Because in the remainder of the chapter we consider only viscosity solu-


tions (respectively, viscosity subsolutions and viscosity supersolutions}, we
will drop the viscosity term and refer to such functions simply as solutions
(respectively, subsolutions and supersolutions).
The key properties we will require of solutions to (1.2) are summarized
as the conclusion to Theorem 1.1 below. Before stating the theorem, we
list the properties of b(·),a(·), ri(·), and k(·,·) that will be used. These
conditions are more than sufficient to guarantee the existence of a weak
sense solution to our stochastic differential equation with reflection that is
unique in the weak sense [41].
Al.l. The set U is compact. The functions b(·, ·) and k(·, ·)are continuous
on G xU, and b(·,a) is Lipschitz continuous, uniformly in a E U. The
function a(·) is Lipschitz continuous on G.
16.1 Definitions and Some Properties of Viscosity Solutions 447

A1.2. For each x E 8G, the convex hull of the set r(x) does not contain
the origin. For each X E aa, let J(x) = {j :X E 8G;}. We may assume
without loss of genemlity that J(x) = {1, ... ,m} for some m :::; k. Set
vii = ln~r;(x)l - oi;, where Oij = 0 if i =I j and Oij = 1 if i = j. Then
for each x E 8G, we assume that the spectml mdius of them x m matrix
V = (Vij) is strictly less than one.

Remark. The spectral radius assumption (A1.2) is common in the litera-


ture related to heavy traffic problems [69, 107]. However, it is significantly
stronger than the "completely-S" condition discussed in Chapter 5.

Theorem 1.1. Assume (A1.1) and (A1.2). Then the following comparison
result holds. If V* (-) and V. (-) are respectively a subsolution and superso-
lution to (1.2), then
V*(x) :::; V..(x) (1.5)
for all x E G. Define V(x) by {1.1). If V(·) is continuous, then it is a
solution to (1.2).

Remarks. Note that if a solution to (1.2) exists, then, by {1.5), it must


be unique. If V1 (·) and V2(-) are both solutions, then since they are also
subsolutions and supersolutions, we have

Vi(x):::; V2(x):::; Vi(x)

for all x E G. There are many sets of conditions that will guarantee that
V(·) defined by (1.1) is continuous. Although we will not prove it, the
continuity of V(·) follows under (A1.1) and (A1.2). A proof can be based
on the methods of weak convergence and the uniqueness results given in
[41]. We note that the proof of continuity of V(-) must be based on the
representation {1.1). The continuity of V(·) is related to the continuity of
the total cost under the measure on the path space induced by an optimally
(or €-optimally) controlled process. This continuity has been an important
consideration for the probabilistic approach described previously.

In keeping with the objectives of this chapter, we will not give a proof of
Theorem 1.1. Instead, we will simply piece together some existing results in
the literature. The comments that are given are intended to outline what is
needed to apply uniqueness results for a class of PDE to prove convergence
of numerical schemes for a related control problem. The assumptions we
have made on G and r( ·) make the setup used here a special case of the
general result given in [43]. These conditions relate directly to the reflected
diffusion and need no further discussion.
In the proof of an inequality such as (1.5), it is most natural to place
conditions on F(·, ·, ·,·).It is proved in [43] that the comparison principle
of Theorem 1.1 holds under the following conditions.
448 16. The Viscosity Solution Approach

1. For all x E G, v E JR, p E JRk, and X, Y E Sk with X~ Y,

F(x,v,p,X):::; F(x,v,p,Y). (1.6)

2.
F(·, ·, ·,·)is continuous on G X IR X IRk X sk. (1.7)

3. There is a continuous function m1 : [0, oo) -+ lR satisfying m1 (0) = 0


such that for all(}~ 1, x,y E G, v E JR, p E JRk, X,Y E Sk,

F(y, v, p,- Y) - F(x, v, p, X) :::; m1 (lx- YI(IPI + 1) + Olx - Yl 2 )


(1.8)
whenever

-O ( I 0 ) < ( X 0 ) < (} ( I -I )
0 I - 0 Y - -I I

in the sense of symmetric matrices (here I is the k x k unit matrix).

4. There is an open neighborhood U of 8G (relative to G) and a con-


tinuous function m2 : [0, oo) -+ lR satisfying m2(0) = 0 such that for
all x E U, v E JR, p, q E IRk, X, Y E Sk,

IF(x,v,p, Y)- F(x,v,q,X)I:::; m2 (IP- ql + IIX- Yll). (1.9)

(As noted previously there are many equivalent definitions of viscosity so-
lution. In particular, the definition used in [43] is equivalent to the one used
here.)
Thus, in order to make use of the results of [43], we must describe condi-
tions on the components appearing in the statement of the control problem
(b(·),a(·) and k(·, ·))that are sufficient to guarantee (1.6)-(1.9). Property
1 is called "degenerate ellipticity" and follows easily from the definition
ofF(·,·,·,·) (it also is a consequence of property 3, as remarked in [30]).
Properties 2 and 4 are rather simple consequences of (Al.l). Property 3 is
the most difficult to verify, and, in particular, requires the Lipschitz conti-
nuity conditions given in (ALl). We omit the proof, and instead refer the
reader to [30].
The last issue to be resolved is whether or not V(·) as defined by (1.1)
is indeed a solution to (1.2). It turns out that the main difficulty here is in
verifying that V (·) satisfies the dynamic programming principle: for ~ > 0
and x E G,

(where the infimum is over all admissible controls).


16.2 Numerical Schemes 449

If we assume that V(·) is continuous and that (1.10) holds, then minor
modifications of the proof of Theorem 3.1 in [58, Chapter 5] show that V (·)
is indeed a solution of {1.2).
Thus, all that needs to be verified is (1.10). For our particular problem the
process is never stopped, and the dynamic programming equation follows
from the Markov property and the definition of V(·).

16.2 Numerical Schemes


We retain the notation of the previous chapters. Thus Sh is a grid in JRk,
Ch = Sh n C, and C~ = Sh n C 0 • We let act
(the "discretized reflecting
boundary") be a subset of Sh -C~ that satisfies supxEaa+ infyEG Jx-yJ --? 0
h
as h --? 0. In this chapter we will consider numerical approximations that
are defined by a collection of relations of the form
Sh(x, Vh(x), Vh(·)) = 0, x E C~ U act. (2.1)
Of course, most of the numerical schemes of the previous chapters are of
this form. In previous chapters, the assumptions on the scheme have been
phrased in terms of the local consistency of an associated Markov chain.
Here, we will impose assumptions directly on Sh(-, ·,·).In the assumptions
that are given below, u 1(·) and u2(·) will denote real valued functions on
c~ u act. In order to make sense of Sh(x, ·, ·) for X E act,
it will be
necessary to extend the domain on which F*(-, ·, ·, ·) and F.(·,·,·,·) are
defined. To do this, we will assume (as in Chapter 5) the existence of an
extension of r(·) [again denoted by r(·)] from aG to an open neighborhood
cl of ac, such that r(·) is upper semicontinuous: if Xn--? X and rn --? r,
with rn E r(xn), then r E r(x). Such an extension can easily be constructed
given the special form of C and the definition of r( ·). We then define
F*(x,v,p,X) = max{-r'p: r E r(x)} }
for x E C1-C.
F*(x,v,p,X) = min{-r'p: r E r(x)}
The extensions F*(-, ·, ·, ·) and F*(·, ·, ·, ·) retain the upper semicontinuity
and lower semicontinuity properties, respectively.
We can now state the assumptions on Sh(·, ·, ·). We assume h > 0 is
small enough that act c C1.
A2.1. lful(·) ~ u2(·), then for all h > 0, X E c~ u act, and v E JR, we
have
Sh(x,v,u1(·)):::; Sh(x,v,u2(·)).
A2.2. For all h > 0, there is a unique solution Vh(-) to (2.1), and this
solution has a bound that is independent of h for small h > 0:
lim sup sup jVh(x)l < oo.
h-+0 xEG~ uaat
450 16. The Viscosity Solution Approach

A2.3. There is consistency in the following sense. For any o E 1R and u( ·)


defined on Gg uaat, we define !0 = u(·) +o by J(x) = u(x) +o. Suppose
we are given any r/J(·) E C2 (G 1 ), any x E G, any sequence {yh, h > 0}
satisfying yh E G~ U oGt and yh -t x, and any sequence oh -t 0 E JR.
Then

lim sup Sh(yh, rjJ(yh) + oh, r/J(·) + oh) ~ F*(x, rjJ(x), r/Jx(x), rPxx(x))
h--+0

and

Remark. Given x, yh, and oh as in the statement of {A2.3), but with


oh-t 0' it follows that

limsupSh(yh,rjJ(yh) + oh,r/JO + oh) ~ F*(x,r/J(x) + o,r/Jx(x),r/Jxx(x))


h--+0

and

This follows from (A2.3) by replacing r/J(·) by ¢0 = r/J(·) + o.


Remarks on the Assumptions. In both the viscosity solution and Mar-
kov chain approaches to the convergence of schemes, two different types of
assumptions appear. The first category may be labeled as "conditions on
the limit," where limit refers to a limit controlled process in the Markov
chain case and a limit PDE in the viscosity solution case. In the Markov
chain approach, we usually require weak sense uniqueness of an €-optimal
controlled process. For the viscosity solution approach we need a compari-
son principle as in Theorem 1.1, which implies the uniqueness of solutions.
The second category of assumptions are conditions that are placed di-
rectly on the scheme itself, i.e. (A2.1)-(A2.3). Here we will make a few
remarks on this type of assumption. An obvious question to ask is how
might functions Sh(·, ·, ·) satisfying {A2.1)-(A2.3) be found. One possibil-
ity is to examine the numerical schemes constructed via the Markov chain
approach in Chapter 5. In the context of our example, we have transition
probabilities ph(x, yia) and interpolation times ~th(x, a) that satisfy the
proper local consistency properties: if {ef, i < oo} is the controlled Markov
chain that uses these probabilities, then

E~;;:~e~ = b(x,a)~th(x,a) +o(~th(x,a)), (2.2)

cov~;~~e~ = a(x)~th(x,a) + o(~th(x,a)) (2.3)


16.2 Numerical Schemes 451

for x E C~; there are t: 1 > 0, c1 > 0 and c2(h)---+ 0 ash---+ 0 such that
E~;::Lle~ E {lh+o(h): c2(h) 2:0 2: c1h,1 E r(x)}, (2.4)

cov~;~Ll~~ = O(h 2 ), (2.5)

ph(x,C~) 2: t: 1 , Llth(x,a) = 0 (2.6)


for all h > 0 and X E act. We can then use an approximation Vh(·)
defined by

V"(x) ~ l~ [e-P~<'(z,o) ~>"(x, yla)V•(y) + ll.t"(x, a)k(x, a)]·


Approximating e-f3Llth(x,a) by 1/(1 + {3Llth(x, a)) (as discussed in Chapter
5), we can also use the scheme

We can rewrite this last equation as

This suggests the definition

Sh(x, v, u(·)) =sup [Ll


aEU
h(1 )
t x,a
~ph(x, yla) [u(y)- v] + {3v- k(x, a)].
y

The property (A2.1) is clearly a consequence of the nonnegativity of the


ph(x, yla) and Llth(x, a). Using the interpretation of Vh(x) as the value
function for a controlled Markov chain, one can show (A2.2). Finally (A2.3)
follows from the local consistency conditions (2.2)-(2.6).
Thus, the schemes one is naturally led to via the Markov chain approach
satisfy (A2.1)-(A2.3) (at least for the given example). However, a method
that is perhaps more natural from the PDE point of view is to substitute
"finite difference" approximations into {1.2) that are based on evaluating
V (·) at the grid points C~ u act, and then try to draw out the restrictions
placed on the approximations by (A2.1)-(A2.3). For example, consider a
point x E C 0 . Suppose that finite difference approximations based on the
values Vh(y), y E C~ U act, are substituted into

F(x, V(x), Vx(x), Vxx(x)) = 0.


452 16. The Viscosity Solution Approach

In particular, let us assume that the finite difference approximations are


linear functions of differences of the form V(y)- V(x). We thereby obtain
a relation of the form

where the qh(x, yia) depend on b(·, ·) and a{·). From {A2.1), we deduce
that qh(x, yia) ;::: 0. Define Qh(x, a) = Ly qh(x, yia). If {2.8) is to be
meaningful, then we would expect Qh(x, a) i= 0. By defining ~th(x, a) =
1/Qh(x, a) and ph(x, yia) = qh(x, yia)/Qh(x, a), we put (2.8) into the form
of {2.7). Note that the ~(x, yia) so defined are the transition probabilities
of a controlled Markov chain.
Owing to the presence of the supremum operation in (2. 7) it is difficult
to directly draw a conclusion on the properties that the ph(x,yia) must
satisfy. However, by choosing the linear function ¢(x) = p'x, we see that
(A2.3) implies

lim sup
h--+0 aEU
[~t h(x,1 a ) LPh(x, yia)(y- x)'p- k(x, a)]
Y (2.9)
= sup [-b(x, a)'p- k(x, a)].
aEU

If we make the plausible assumption that in order to achieve {2.9) we will


need

uniformly in a E U, then {A2.3) would require (2.2). By using quadratic


functions, we see that (A2.3) would also require (2.3).
This suggests that although it may not be necessary in every case we
would still expect, in general, that a finite difference scheme chosen to
satisfy (A2.1)-(A2.3) would also produce a Markov chain that is locally
consistent, at least for the points in G~. Analogous remarks hold for points
inact.

16.3 Proof of Convergence


Theorem 3.1. Assume (ALl), {Al.2), (A2.1), (A2.2), (A2.3), and that
V(-) defined by (1.1) is continuous. Then for Vh(·) defined by (2.1), we
have
16.3 Proof of Convergence 453

Proof. For x E G, we define


V*(x) =lim sup lim sup sup {Vh(y): jx- Yi::; h, y E G~ U oGt},
6-tO h-tO
~(x) = liminf liminf inf {Vh(y): jx- yj::; h,y E ~ U oGt}.
6-tO h-tO
These definitions imply that V* (-) is upper semicontinuous, ~ ( ·) is lower
semicontinuous, and V* ( ·) ~ V. ( ·). We will prove that V* ( ·) is a subsolution
and that V. (·) is a supersolution. Assuming for now that this is true, we
can apply Theorem 1.1 and conclude
V*(x) ::; V(x) ::; V*(x) ::; V*(x).
Thus, we obtain the conclusion of the theorem.
We now prove that V* (-) is a subsolution. The proof that V. ( ·) is a
supersolution is essentially the same and omitted. Let¢(-) E C 2 (G 1 ), and
assume that x 0 EGis a maximum point of V*(·)- ¢(·).We wish to show
that
(3.1)
We first note that by the continuity given in (1.7), (3.1) follows if it can
be proved for functions of the form
¢6(x) = ¢(x) + hjx- xol 2 , 8 > 0.
This allows us to assume without loss that xo is a strict local maximum of
V*(·)- ¢(·).By (A2.2) we can further assume that xo is a strict maximum
on G by suitably redefining ¢( ·) outside an appropriate open neighborhood
of xo. Let xh be a point in G~ U oGt at which Vh(·) - ¢(-) attains its
maximum, relative to ~ UoGt. The definition of V* ( ·) and the assumption
that xo is a strict global maximum of V* ( ·) - ¢( ·) imply the existence of
a subsequence of h (again denoted by h) and xh E G 1 such that xh ~ x 0
and also Vh(xh) ~ V*(xo). By the definition of Vh(·),
Sh(xh, Vh(xh), Vh(·)) = 0. (3.2)
Owing to the definition of xh,
¢(-) + [Vh(xh)- ¢(xh)] ~ Vh(·).
This inequality, (A2.1), and (3.2) then yield
Sh(xh,¢(xh) + [Vh(xh)- ¢(xh)],¢(·) + [Vh(xh)- ¢(xh)])::; 0,
where we interpret ¢(-) here as the restriction of ¢(-) to G~ u oGt. Using
{A2.3) we have
F.(xo, V*(xo),¢x(xo),¢xx(xo))

::; liminf 8h(xh,¢(xh) + [Vh(xh)- ¢(xh)],¢(·) + [Vh(xh)- ¢(xh)])


h-tO
::; 0,
which is (3.1). •
References

[1] M. Akian. Resolution numerique d'equations d'Hamilton-Jacobi-


Bellman au moyen d'algorithmes multigrilles et d'iterations sur les
politiques. In Eighth Conference on Analysis and Optimization of
Systems, Antibes, France, 1988. INRJA.
[2] M. Akian. Methodes Multigrilles en Controle Stochastique. PhD the-
sis, University of Paris, 1990.
[3] E. Altman and H. J. Kushner. Admission control for combined guar-
anteed performance and best effort communications systems under
heavy traffic. SIAM J. Control and Optimization, 37:1780-1807,
1999.
[4] E. Altman and H. J. Kushner. Control of polling in presence of
vacations in heavy traffic with applications to sattelite and mobile
radio systems. To appear, SIAM J. Control and Optimization, 2000.
[5] R. Anderson and S. Orey. Small random perturbations of dynami-
cal systems with reflecting boundary. Nagoya Math. J., 60:189-216,
1976.
[6] G. Barles and B. Perthame. Discontinuous solutions of deterministic
optimal stopping time problems. Model. Math. Anal. Num., 21:557-
579, 1987.
[7] G. Barles and B. Perthame. Exit time problems in optimal con-
trol and the vanishing viscosity method. SIAM J. Control Optim.,
26:1133-1148, 1988.
456 References

[8] G. Barles and P. Souganidis. Convergence of approximation schemes


for fully nonlinear second order equations. J. Asymptotic Analysis,
4:271-283, 1991.

[9] M. S. Bazaraa and J. J. Jarvis. Linear Programming and Network


Flows. Wiley, New York, 1977.
[10] L. Berkovitz. Optimal Control Theory, volume 12 of Applied Mathe-
matical Sciences. Springer-Verlag, Berlin, 1974.
[11] D.P. Bertsekas. Dynamic Programming: Deterministic and Stochas-
tic Models. Prentice-Hall, Englewood Cliffs, NJ, 1987.
[12] D. P. Bertsekas and D. A. Castanon. Adaptive aggregation meth-
ods for infinite horizon dynamic programming. IEEE Trans. Auto.
Control, 34:589-598, 1989.
[13] P. Billingsley. Convergence of Probability Measures. John Wiley, New
York, 1968.

[14] M. Boue and P. Dupuis. Markov chain approximations for determin-


istic control problems with affine dynamics and quadratic cost in the
control. SIAM J. on Numerical Analysis, 36:667-695, 1999.
[15] E.-K. Boukas and A. Haurie. Manufacturing flow control and pre-
ventative maintenance: A stochastic control approach. IEEE Trans.
Auto. Control, 35:1024-1031, 1990.
[16] W. L. Briggs. A Multigrid Tutorial. SIAM, Philadelphia, 1987.

[17] A. Budhiraja and H. J. Kushner. Approximation and limit results


for nonlinear filters over an infinite time interval. To appear, SIAM
J. Control and Optmization, 2000.

[18] A. Budhiraja and H. J. Kushner. Approximation and limit results for


nolllinear filters over an infinite time interval: Part II, random sam-
pling algorithms. To appear, SIAM J. on Control and Optimization,
2000.

[19] F. Campillo. Optimal ergodic control of nonlinear stochastic systems.


Technical Report 1257, INRIA, Sophia Antipolis, France, 1990.
[20] P. Chancelier, C. Gomez, J.-P. Quadrat, and A. Sulem. Automatic
study in stochastic control. In W. Fleming and P.-1. Lions, editors,
Vol. 10, IMA Volumes in Math. and Its Applications, Berlin, 1988.
Springer-Verlag.
[21 J P. Chancelier et. al. An expert system for control and signal pro-
cessing with automatic FORTRAN program generation. In Math.
Systems Symp., Stockholm, 1986. Royal Inst of Technology.
References 457

[22] P. L. Chow, J.-L. Menaldi, and M. Robin. Additive control of stochas-


tic linear systems with finite horizons. SIAM J. Control Optimization,
23:858-899, 1985.
[23] K.-L. Chung. Markov Chains with Stationary Transition Probabili-
ties. Springer-Verlag, Berlin, 1960.
[24] S.-L. Chung and F. B. Hanson. Optimization techniques for stochas-
tic dynamic programming. In Proceedings of the 29th IEEE Con-
ference on Decision and Control, Honolulu, New Yock, 1990. IEEE
Publishers.
[25] S.-L. Chung and F. B. Hanson. Parallel optimization for computa-
tional stochastic dynamic programming. In P. C. Yew, editor, Pro-
ceedings of the 30th IEEE Conference on Pamllel Processing, Univer-
sity Park, PA, 1990. Penn State University Press.
[26] S.-L. Chung, F. B. Hanson, and H. H. Xu. Finite element method for
parallel stochastic dynamic programming. In Proceedings of the 30th
IEEE Conference on Decision and Control, Bristol, England, New
York, 1991. IEEE Publishers.

[27] C. Costantini. The Skorokhod oblique reflection problem and a dif-
fusion approximation for a class of tmnsport processes. PhD thesis,
University of Wisconsin, Madison, 1987.
[28] R. Courant and D. Hilbert. Methods of Mathematical Physics, vol-
ume 1. Interscience, New York, 1937. (first english edition).
[29] M. G. Crandall, L. C. Evans, and P.-L. Lions. Some properties of
viscosity solutions of Hamilton-Jacobi equations. Trans. Amer. Math.
Soc., 282:487-501, 1984.
[30] M. G. Crandall, H. Ishii, and P.-L. Lions. User's guide to viscosity
solutions of second order partial differential equations. Bull. Amer.
Math. Soc., N. S., 27:1-67, 1992.
[31] M. G. Crandall and P.-L. Lions. Viscosity solutions of Hamilton-
Jacobi equations. Trans. Amer. Math. Soc., 277:1-42, 1983.
[32] G. Dahlquist and A. Bjorck. Numerical Methods. Prentice-Hall, En-
glewood Cliffs, NJ, 1974.
[33] J. Dai. Steady state analysis of reflected Brownian motions: chamc-
terization, numerical methods and queueing applications. PhD thesis,
Operations Research Dept., Stanford University, 1990.
[34] J. G. Dai and J. M. Harrison. Reflected Brownian motion in an
orthant: numerical methods for steady state analysis. Ann. Appl.
Probab., 2:65-86, 1992.
458 References

[35] J. G. Dai and J. M. Harrison. The QNET method for two moment
analysis of closed manufacturing systems. Ann. Appl. Probab., 3:968-
1012, 1993.

[36] J. G. Dai, D. H. Yeh, and C. Zhou. The QNET method for reentrant
queueing networks with priority disciplines. Operations Research,
45:610-623, 1997.

[37] M. H. A. Davis and A. R. Norman. Portfolio selection with transac-


tions costs. Math. Oper. Res., 15:676-713, 1990.

[38] C. Derman. Denumerable state Markov decision processes-average


criterion. Ann. Math. Statist., 37:1545-1554, 1966.
[39] G. B. DiMasi and W. J. Runggaldier. Approximations and bounds
for discrete time nonlinear filtering. In Lecture Notes in Control and
Information Sciences, volume 44, pages 191-202. Springer-Verlag,
Berlin, 1982.

[40] P. Dupuis and R. S. Ellis. Large deviations for Markov processes


with discontinuous statistics, II: Random walks. Probab. Theory Rel.
Fields, 91:153-194, 1992.

[41] P. Dupuis and H. Ishii. SDE's with oblique reflection on nonsmooth


domains. To appear in Annals Probab., 21:554-580, 1993.
[42] P. Dupuis and H. Ishii. On Lipschitz continuity of the solution map-
ping to the Skorokhod problem, with applications. Stochastics, 35:31-
62, 1991.

[43] P. Dupuis and H. Ishii. On oblique derivative problems for fully non-
linear second-order elliptic PDE's on domains with corners. Hokkaido
Math J., 20:135-164, 1991.

[44] P. Dupuis, H. Ishii, and H. M. Soner. A viscosity solution approach to


the asymptotic analysis of queueing systems. Annals Probab., 18:226-
255, 1990.

[45] P. Dupuis and H. J. Kushner. Large deviations estimates for systems


with small noise effects, and applications to stochastic systems theory.
SIAM J. Control Optimization, 24:979-1008, 1986.

[46] P. Dupuis and J. Oliensis. An optimal control formulation and related


numerical methods for a problem in shape recontruction. Annals
Applied Probab., 4:287-346, 1994.

[47] P. Dupuis and K. Ramanan. Convex duality and the Skorokhod


Problem, I and II. Prob. Th. and Rel. Fields, 115:153-195, 1999 and
115:197-236, 1999.
References 459

[48] P. Dupuis and A. Szpiro. Convergence of the optimal feedback poli-


cies in a numerical method for a class of deterministic optimal control
problems. SIAM J. on Control and Opt., submitted, 1999.

[49] E. B. Dynkin. Markov Processes. Springer-Verlag, Berlin, 1965.


[50] N. El Karoui and S. Meleard. Martingale measures and stochastic
calculus. Probability Theory and Related Fields, 84:83-101, 1990.
[51] R. Elliot. Stochastic Calculus and Applications. Springer-Verlag,
Berlin, 1982.

[52] S. N. Ethier and T. G. Kurtz. Markov Processes: Chamcterization


and Convergence. Wiley, New York, 1986.

[53] A. Federgruen and P. J. Schweitzer. Discounted and nondiscounted


value iteration Markov decision processes. In M.L. Puterman, editor,
Dynamic Progmmming and Its Applications. Academic Press, New
York, 1978.
[54] W. Feller. An Introduction to Probability Theory and its Applications,
Volume 2. Wiley, New York, 1966.

[55] W. H. Fleming and M. Nisio. On stochastic relaxed control for par-


tially observed diffusions. Nagoya Math. J., 93:71-108, 1984.

[56] W. H. Fleming and R. Rishel. Deterministic and Stochastic Optimal


Control. Springer-Verlag, Berlin, 1975.

[57] W. H. Fleming and H. M. Soner. Asymptotic expansions for Markov


processes with Levy generators. Appl. Math. Optimization, 19:203-
223, 1989.
[58] W. H. Fleming and H. M. Soner. Controlled Markov Processes and
Viscosity Solutions. Springer-Verlag, New York, 1992.
[59] M. I. Freidlin and A. D. Wentzell. Random Perturbations of Dynam-
ical Systems. Springer-Verlag, New York, 1984.

[60] A. Friedman. Stochastic Differential Equations and Applications.


Academic Press, New York, 1975.

[61] S. J. Gass. Linear Progmmming. McGraw Hill, New York, NY, fifth
edition, 1985.

[62] H. Goldstein. Classical Mechanics. Addison-Wesley, Reading, MA,


1950.
[63] G. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins
Press, Baltimore, second edition, 1989.
460 References

[64] W. Hackbusch. Multigrid Methods and Applications. Springer-


Verlag, Berlin, 1985.
[65] J. M. Harrison and V. Nguyen. The QNET method for two-moment
analysis of open queueing systems. Queueing Systems, 6:1-32, 1990.
[66] J. M. Harrison. Brownian Motion and Stochastic Flow Systems. Wi-
ley, New York, 1985.
[67] J. M. Harrison. Brownian models of queueing networks with het-
erogeneous customer populations. In W. Fleming and P.-1. Lions,
editors, Vol. 10, IMA Volumes in Math. and Its Applications, pages
147-186, Berlin, 1988. Springer-Verlag.
[68] J. M. Harrison and V. Nguyen. The QNET method for two moment
analysis of open queueing networks. Queueing Systems, 6:1-32, 1990.
[69] J. M. Harrison and M. I. Reiman. Reflected Brownian motion on an
orthant. Annals Probab., 9:302-308, 1981.
[70] J. M. Harrison and R. J. Williams. Brownian models of open queueing
networks with homogeneous customer populations. Stochastics and
Stochastics Rep., 22:77-115, 1987.
[71] J. W. Helton and M. R. James. Extending Hr:;o Control to nonlinear
Systems. SIAM, 1999.
[72] B. K. P. Horn and Brooks. Shape Prom Shading. M.I.T. Press, Cam-
bridge, MA, 1989.
[73] R. Howard. Dynamic Progmmming and Markov Processes. M.I.T.
Press, Cambridge, MA, 1960.
[74] D. L. Iglehart and W. Whitt. Multiple channel queues in heavy
traffic. Adv. Appl. Probab., 2:15D-177, 1970.
[75] N. Ikeda and S. Watanabe. Stochastic Differential Equations and
Diffusion Processes. North-Holland, Amsterdam, 1981.
[76] H. Ishii. Perron's method for Hamilton-Jacobi equations. Duke Math.
J., 55:369-384, 1987.
[77] H. Ishii. A boundary value problem of the Dirichlet type for
Hamilton-Jacobi equations. Ann. Sc. Norm. Sup. Pisa, (IV), 16:105-
135, 1989.
[78] J. Jacod. Calcul Stochastique et Problemes de Martingales. Springer-
Verlag, New York, 1979.
[79] J. Jacod and A. N. Shiryaev. Limit Theorems for Stochastic Pro-
cesses. Springer-Verlag, Berlin, 1987.
References 461

[80] R. Jensen. The maximum principle for viscosity solutions of fully non-
linear second order partial differential equations. Arch. Rat. Mech.
Anal., 101:1-27, 1988.

[81] L. C. M. Kallenberg. Linear programming and finite markov control


problems. Technical report, Mathematical Center, Amsterdam, 1983.
Mathematical Center Thact no. 148.
[82] I. Karatzas and S. E. Shreve. Equivalent models for finite fuel
stochastic control. Stochastics, 18:245-276, 1986.
[83] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Cal-
culus. Springer-Verlag, New York, 1988.
[84] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes.
Academic Press, New York, 1981.
[85] T. G. Kurtz. Approximation of Population Processes, volume 36 of
CBMS-NSF Regional Conf. Series in Appl. Math. SIAM, Philadel-
phia, 1981.
[86] H. J. Kushner. Control of trunk line systems in heavy traffic. SIAM
J. Control Optim., 33:765-803, 1995.

[87] H. J. Kushner. Dynamical equations for nonlinear filtering. J. Diff.


Equations, 3:179--190, 1967.
[88] H. J. Kushner. Introduction to Stochastic Control Theory. Holt,
Rinehart and Winston, New York, 1972.
[89] H. J. Kushner. Probabilistic methods for finite difference approxi-
mation to degenerate elliptic and parabolic equations with Neumann
and Dirichlet boundary conditions. J. Math. Appl., 53:644-668, 1976.
[90] H. J. Kushner. Probability Methods for Approximations in Stochastic
Control and for Elliptic Equations. Academic Press, New York, 1977.
[91] H. J. Kushner. Optimality conditions for the average cost per unit
time problem with a diffusion model. SIAM J. Control Optimization,
16:33Q-346, 1978.
[92] H. J. Kushner. A robust computable approximation to the optimal
nonlinear filter. Stochastics, 3:75-83, 1979.
[93] H. J. Kushner. Approximation and Weak Converyence Methods for
Random Processes with Applications to Stochastic System Theory.
MIT Press, Cambridge, MA, 1984.
[94] H. J. Kushner. Numerical methods for stochastic control problems in
continuous time. SIAM J. Control Optimization, 28:999--1048, 1990.
462 References

[95] H. J. Kushner. Weak Converyence Methods and Singularly Perturbed


Stochastic Control and Filtering Problems, volume 3 of Systems and
Control. Birkhauser, Boston, 1990.
[96] H. J. Kushner. Domain decomposition methods for large Markov
chain control problems and nonlinear elliptic type problems. SIAM
J. Sci. Comput., 18:1494-1516, 1997.
[97] H. J. Kushner. A numerical method for reflected diffusions: Con-
trol of reflection directions and applications. J. Applied Math. and
Optimization, 33:61-79, 1996.
[98] H. J. Kushner. Robustness and convergence of approximations to
nonlinear filters for jump-diffusions. Computational and Applied
Math., 16:153-183, 1997.
[99] H. J. Kushner. Consistency issues for numerical methods for variance
control with applications to optimization in finance. IEEE 77-ans. on
Automatic Control, 44:2283-2296, 1999.
[100] H. J. Kushner. Heavy 77-affic Analysis of Controlled and Uncontrolled
Queueing and Communication Networks. Springer Verlag, Berlin and
New York, 2000.
[101] H. J. Kushner and G. B. DiMasi. Approximations for functionals
and optimal control problems on jump-diffusion processes. J. Math.
Anal. Appl., 63:772-800, 1978.
[102] H. J. Kushner and H. Huang. Approximation and limit results for
nonlinear filters with wide bandwidth observation noise. Stochastics,
16:65--96, 1986.
[103] H. J. Kushner, D. Jarvis, and J. Yang. Controlled and optimally
controlled multiplexing systems: A numerical exploration. Queueing
Systems, 20:255--291, 1995.
[104] H. J. Kushner and A. J. Kleinman. Accelerated procedures for the
solution of discrete Markov control problems. IEEE 77-ans. Auto.
Control, AC-16:147-152, 1971.
[105] H. J. Kushner and A. J. Kleinman. Mathematical programming and
the control of Markov chains. Int. J. Control, 13:801-820, 1971.
[106] H. J. Kushner and L. F. Martins. Numerical methods for stochastic
singular control problems. SIAM J. Control Optimization, 29:1443-
1475, 1991.
[107] H. J. Kushner and K. M. Ramachandran. Optimal and approximately
optimal control policies for queues in heavy traffic. SIAM J. Control
Optimization, 27:1293-1318, 1989.
References 463

(108] H. J. Kushner and W. Runggaldier. Nearly optimal state feedback


controls for stochastic systems with wideband noise disturbances.
SIAM J. Control Optimization, 25:289-315, 1987.

(109] H. J. Kushner and J. Yang. Numerical methods for controlled routing


in large trunk line systems via stochastic control theory. ORSA J.
Computing, 6:30Q-316, 1994.

(110] H. J. Kushner and J. Yang. An effective numeri~al method for con-


trolling routing in large trunk line networks. Math. Computation
Simulation, 38:225-239, 1995.

[111] L. S. Lasdon. Optimization Theory for Large Systems. MacMillan,


New York, 1960.

[112] J.P. Lehoczky and S. E. Shreve. Absolutely continuous and singular


stochastic control. Stochastics, 17:91-110, 1986.

(113] A. J. Lemoine. Networks of queues: A survey of weak convergence


results. Management Science, 24:1175-1193, 1978.

(114] P.-L. Lions. Optimal control of diffusion processes and Hamilton-


Jacobi-Bellman equations. Part 2: Viscosity solutions and uniqueness.
Comm. P. D. E., 8:122D--1276, 1983.

(115] P.-L. Lions and A.-S. Sznitman. Stochastic differential equations with
reflecting boundary conditions. Comm. Pure Appl. Math., 37:511-
553, 1984.

[116] R. Liptser and A. N. Shiryaev. Statistics of Random Processes.


Springer-Verlag, Berlin, 1977.

(117] J. Mandel, editor. Proceedings of the Fourth Copper Mountain Con-


ference on Multigrid Methods, Philadelphia, 1989. SIAM.

(118] L. F. Martins and H. J. Kushner. Routing and singular control for


queueing networks in heavy traffic. SIAM J. Control Optimization,
28:1209-1233, 1990.

(119] S. F. McCormack. Multigrid Methods. SIAM, Philadelphia, PA, 1987.

(120] S. F. McCormack. Multilevel Adaptive Methods for Partial Differen-


tial Equations. SIAM, Philadelphia, 1989.

[121] R. Munos and A. Moore. Influence and variance of a Markov chain:


application to adaptive discretization in optimal control. In Pro-
ceedings of the 39th IEEE Conference on Decision and Control, New
York, 1999. IEEE Publishers.
464 References

[122] R. Munos and A. Moore. Variable resolution discretization for high-


accuracy solutions of optimal control problems. In T. Dean, edi-
tor, Proceedings of the International Joint Conference on Artificial
Intelligence-99, San Francisco, 1999. Morgan Kaufmann.
[123] R. Munos and A. Moore. Variable resolution discretization in optimal
control. Preprint, 1999.
[124] J. Oliensis and P. Dupuis. Direct method for reconstructing shape
from shading. In Proc. SPIE Conf. on Geometric Methods in Com-
puter Vision, pages 116-128, 1991.
[125] M. J. Puterman and M. C. Shin. Modified policy algorithms for dis-
counted Markov decision problems. Management Science, 24:1127-
1137, 1978.
[126] M. L. Puterman. Markov decision processes. In D.P. Heyman and
M.J. Sobel, editors, Stochastic Models, Volume 2, chapter 8. North-
Holland, Amsterdam, 1991.
[127] J.-P. Quadrat. The discrete state linear quadratic problem. In
Proceedings of the 28th IEEE Conference on Decision and Control,
Austin, New York, 1989. IEEE Publishers.
[128] A. Quarteroni and A. Valli. Domain Decomposition Methods for Par-
tial Differential Equations. Oxford University Press, Oxford, 1999.
[129] M. I. Reiman. Open queueing networks in heavy traffic. Math. Oper.
Res., 9:441-458, 1984.
[130] M. I. Reiman and L. M. Wein. Dynamic scheduling of a two class
queue with setups. Operations Research, 46:532-547, 1998.
[131] S. Ross. Introduction to Stochastic Dynamic Programming. Aca-
demic Press, New York, 1983.
[132] S. M. Ross. Arbitrary state Markovian decision problems. Ann.
Math. Stat., 39:412-423, 1968.
[133] E. Rouy and A. Tourin. A viscosity solutions approach to shape-
from-shading. SIAM J. Num. Analysis, 1992.
[134] Y. Saisho. Stochastic differential equations for multi-dimensional do-
main with reflecting boundary. Probab. Theory Rel. Fields, 74:455-
477, 1987.
[135] J. N. T. Schult. Numerical solution of the hamilton-jacobi-bellman
equation for a freeway traffic control problem. Technical report, Cen-
ter for Mathematics and Computer Science, Amsterdam, 1989. Note
BS N8901.
References 465

[136] A. N. Shiryaev. Stochastic equations of nonlinear filtering of jump


Markov processes. Problemy Peredachi Informatsiir 3:3-22, 1966.
[137] A. N. Shiryaev. Optimal Stopping Rules. Springer-Verlag, Berlin,
1978.
[138] A. N. Shiryayev. Probability. Springer-Verlag, New York, 1984.
[139] H. M. Soner and S. E. Shreve. Regularity of the value function for a
two-dimensional singular stochastic control problem. SIAM J. Con-
trol Optimization, 27:876-907, 1989.
[140] P. E. Souganidis. Approximation schemes for viscosity solutions of
Hamilton-Jacobi equations. J. Diff. Eq., 56:345-390, 1985.
[141] P. Stoll, C. W. Shu, and B. B. Kimia. Shock capturing numerical
methods for viscosity solutions of certain PDEs in computer vision:
The Godunov, Osher-Sethian and ENO schemes. Technical report,
Brown University, 1994. LEMS-132.
[142] J. C. Strikwerda. Finite Difference Schemes and Partial Differential
Equations. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1989.
[143] D. W. Stroock and S. R. S. Varadhan. On degenerate elliptic and
parabolic operators of second order and their associated diffusions.
Comm. Pure Appl. Math., 25:651-713, 1972.
[144] D. W. Stroock and S. R. S. Varadhan. Multidimensional Diffusion
Processes. Springer-Verlag, New York, 1979.
[145] A. Szpiro and P. Dupuis. Second order numerical methods for first
order Hamilton-Jacobi equations. SIAM J. on Numerical Analysis,
submitted, 1999.
[146] M. Taksar. Average optimal singular control and a related stopping
problem. Math. Oper. Res., 10:63-81, 1985.
[147] H. Tanaka. Stochastic differential equations with reflecting boundary
conditions in convex regions. Hiroshima Math. J., 9:163-177, 1979.
[148] L. M. Taylor and R. J. Williams. Existence and uniqueness of semi-
martingale reflecting Brownian motions in an orthant. Probab. The-
ory and Rel. Fields, 96:283-317, 1993.
[149] L. M. Taylor. Existence and Uniqueness of Semimartingale Reflecting
Brownian Motions in an Orthant. PhD thesis, University of Califor-
nia, San Diego, 1990.
[150] H. C. Tijms. Stochastic Modelling and Analysis: A Computational
Approach. Wiley, New York, 1986.
466 References

[151] H. M. Wagner. Principles of Operations Research. Prentice-Hall,


Englewood Cliffs, NJ, 1975.
[152] J. Warga. Relaxed variational problems. J. Math. Anal. Appl., 4:111-
128, 1962.

[153] L. M. Wein. Optimal control of a two station Brownian network.


Math. Oper. Res., 15:215-242, 1990.
[154] D. J. White. Dynamic programming, Markov chains and the method
of successive approximations. J. Math. Anal. Appl., 6:373-376, 1963.
[155] P. Whittle. Optimization Over Time, Dynamic Programming and
Stochastic Control. Wiley, New York, 1983.
[156] W. M. Wonham. Some applications of stochastic differential equa-
tions to optimal nonlinear filtering. SIAM J. Control, 2:347-369,
1965.

[157] L. C. Young. Generalized curves and the existence of an attained


absolute minimum in the calculus of variations. Compt. Rend. Soc.
Sci. et Lettres Varsovie CI III, 30:212-234, 1937.
[158] M. Zakai. On optimal filtering of diffusion processes. z. Wahrsch.
Gebeite, 11:23G-243, 1969.
Index

Absorbing boundary condition, 55 finite time problem, 336


Accelerated Gauss-Seidel method, Markov chain approximation,
167 154
Accelerated Jacobi method, 166
Accelerated methods Compensator, 361
comparisons, 168 Completely-S" condition, 135
Adaptive grid methods, 175 Continuous time interpolations
Admissible control law, 20, 33, 71, convergence, 290
76 Continuous time interpolation, 72,
Approximation in policy space, 156 75
ergodic cost, 197 convergence, 276
modified, 161 local properties, 71, 75
Approximation in value space reflecting boundary, 315
ergodic cost, 196 reflecting boundary, 305
approximation in value space, 156 representation, 305, 315
singular control, 315
Bellman equation, 81, 82 Contraction 46, 51
discounted cost, 130 Control
finite time, 337 admissible, 70, 71, 74, 76, 80,
heavy traffic model, 240 362
optimal stopping, 142, 298 pure Markov, 45, 48
reflecting boundary, 143 randomized, 45
singular control, 243 Controlled jump diffusion, 33, 269
unique solution, 156 convergence of costs, 281
Bellman equation, continuous time discontinuous dynamics, 275
468 Index

limits of, 271 contraction, 198, 207


Controlled transition probability, cost function, 193, 202, 203
71 invariant measure, 193, 202,
Controlled variance, 91, 148 203
Convergence Jacobi iteration, 196
in distribution, 246 numerical methods, 199, 207
interpolated Markov chains, sense of uniqueness, 193
290 Error bounds
sequence of jump diffusions, Gauss-Seidel method, 166
271 Jacobi method, 165
Convergence of costs, 84 Exit time
lower bound, 292, 311 continuity of, 277
upper bound, 292, 313 randomized stopping, 281
without local consistency, 295 uniform integrability, 260, 294
Covariance matrix via a Girsanov transformation,
skew, 109 280
Explicit Markov chain approxima-
Deterministic problems tion
approximating cost functions, Bellman equation, 336
81 example, 327
Discontinuous cost terms, 275, 296, general method, 330
322, 385 Exterior cone condition, 373
Discontinuous dynamical terms, 275,
295, 322 Financial mathematics, 91
due to uneven grid, 126 Filtration, 8
Discount factor Finite difference approximation
state dependent, 39 central difference (symmetric),
effective averaged, 77 97
Domain decomposition, 171 general method, 106
Dynamic programming equation Markov chain interpretation,
(see Bellman equation) 92
one sided, 93
Ergodic control function uncontrolled Wiener process,
convergence of costs, 320 91
Ergodic control problem Finite time problem
approximation of, 320 decomposition, 338
convergence of costs, 322 solution methods, 338
Ergodic cost function Fokker-Planck equation, 340
stationarity of limit, 321
weak convergence, 321 Gauss-Seidel method, 159
Ergodic cost problem ordering of the states, 163,
approximation to, 201 436
Bellman equation, 194, 204 red-black, 172
boundary cost and control, 213 smoothing properties, 177
centering, 195, 198 Girsanov transformation, 18
Index 469

Grid decomposition methods, 175 Linear programming, 183


Grid refinement, 175 basic feasible solution, 184
basic variable, 184
Heavy traffic, 216 complementary slackness, 185
assumptions, 224 dual problem, 184
boundary approximations, 236 Markov chain optimization,
formal dynamic programming 186
equation, 233 simplex procedure, 185
limits, 219, 220, 227, 231 Local consistency, 71, 128, 237,
Markov chain approximation, 371
234 relaxed, 123
production system, impulse con- with reflection directions, 136
trol, 228
routing control, 229 Markov chain approximations
Implicit Markov chain approxima- sticky boundary, 146
tion tightness, 290
Bellman equation, 337 weak convergence, 290, 308,
example, 331 378
general method, 333 Markov chains
representation of solution, 334 controlled, 48
Impulsive control discounted cost, 38
heavy traffic limit, 228 ergodic cost, 40
Intensity function, 422 reflecting boundary, 39
Interior cone condition, 373 stopping time, 43
Interpolated process, 72, 130, 239 Markov property, 48, 71
representation, 132, 138, 287 Martingale, 8
Interpolated time, 72 local, 9
Interpolation interval, 71 Martingale measure, 351
reflecting state, 137 Martingale problem, 251
Invariant measures Modified approximation in policy
approximation of, 203, 341 space, 161
Ito's formula, 12 Multigrid method
diffusion, 16 k-level procedure, 179
jump diffusion, 32 interpolation operator, 179
vector case, 14 projection operator, 179
smoothing properties of the
Jacobi method, 158 Gauss-Seidel procedure,
ergodic control, 196 176
Jump term Multigrid methods, 176
approximating chain, 128, 289
local consistency, 129 Nonanticipative, 11
properties, 127 control, 19, 33
Jump times Nonlinear filter
Markov chain interpolation, implicit approximation, 335
75 Nonlinear filtering, 340
470 Index

approximations, 345 Reflection directions, 133, 236


convergence of approximations, Reflection terms
345 uniform integrability, 303, 316
Markov chain signal, 343 Relaxations, 160
representation formula, 342 Relaxed control
splitting the updating, 343 approximation of, 276
Normalization of transition prob- chattering theorem, 88
abilities, 100 deterministic, 85
Numerical noise equivalence of values, 86
relative, 149 optimal, 271
representation as a measure,
Optimal stopping 86
approximating chain, 297 Relaxed controls
existence of optimal, 297 deterministic, 85
for Markov chain, 42 Relaxed Poisson measure, 361
Optimal control Richardson extrapolation, 155
approximation of, 282
existence, 276 Semi-accelerated Gauss-Seidel method,
nonexistence, 86 166
Optimal stopping Shape-from-shading, 422
convergence of costs, 298 Simple function, 11
Singular control, 221, 315
Pathwise uniqueness, 15
admission control, 221
Picard iteration, 16
dynamic programming equa-
Poisson random measure, 30
tion, 243
characterization, 252
Markov chain approximation,
Principle of optimality, 44, 49
240
Predictable a-algebra, 352
routing control, 229
Predictable process, 352
Singular points, 422
Prohorov metric, 248
Skorokhod Problem, 24, 220, 302
Queueing system approximations, 136
admission control, 221 bounds on the reflection terms,
controlled heavy traffic limit, 303
221, 227 convention in jump case, 32
heavy traffic approximation, local consistency, 136
216 tightness of approximations,
heuristic approximation, 217 308
Skorokhod representation, 250
Randomized stopping, 281 Splitting the operator, 103, 112
Reflecting boundaries State dependent variance, 91, 148
approximations, 134, 236 Stationary measures
convergence of approximations, approximation of, 323
309 Stochastic differential equation, 14
convergence of costs, 311 with reflection, 25
Reflecting boundary, 21, 136, 302 Stochastic integral, 11
Index 471

Stopping time, 9 Variable grids, 123


Strong existence, 15 lack of local consistency, 123
controlled diffusion, 20 Variance control, 100
Strong uniqueness, 15 Viscosity solution, 445
controlled diffusion, 20 subsolution, 446
Superlinear growth condition, 368 supersolution, 446

Target set, isolated point exam- Weak convergence, 246


ple, 426 Weak existence, 15
Tightness, 248 controlled diffusions, 20
Markov chain approximations, controlled jump diffusions, 33
290, 317 jump diffusions, 31
Time rescaling, 306, 316 Weak uniqueness, 15
inverse, 310 controlled diffusions, 20
Topology for the set [0, oo], 258 controlled jump diffusions, 33
jump diffusions, 31
Upwind approximation, 93 Wiener process
Uniqueness in the sense of prob- approximations to, 288
ability law, 15 characterization, 274, 310
random walk approximation,
78
List of Symbols

a± 94 ~T~ 75, 306


a(·) 14, 71 ~f~ 307
a(x, a) 348 ~e~ 11
ah(·) 71 ~'1/Jh(t) 75
bh(·) 71 e 41
B(S) 10 ei 106
B(U) 86 Et 342
B(U x [O,oo)) 86 E:F. 8
f3 38, 56 Ex 36
c(h) 378 E~ 48
cov 22 E; 48
C(S) 246 E;,n 52
Cb(S) 246 Eh,a
z,n
75
Co(S) 246 Eh,a 75
z,t
C(u) 155
Ch(u) 145, 154
11710 24
fx(x) 14
C"(U) 12 fxx(x) 14
Ck[O, T] 8 F(·) 445
Ck[O,oo) 8 F*(·) 446
Dk[O,T]8 F.(·) 446
Dk[O,oo) 8
:Ft 8
~th(x, a) 71 :F(A) 10
~t~ 71 G 23,70
~fh,o 332
co 54
~fh· 6
n 334 cg 71
474 Index

'Y 40 ph,.S(x, n; y, mla) 332


-yh 202 ph• 6 (x,y) 335
-y(u) 193 PFt 8
'Y 204 ph,o.
x,n
75
-y(x,m) 320 ph,o. 75
x,t
;yh 204 P(S) 247
;yh(u) 204 11"(·) 40
r 26 1rh(u) 202
h(·) 30 11"(·, ·) 248
J~ 131 IT(.) 28, 127
J(t) 28 fi(x, H) 29, 128
J,(t) 32 aa;_ 144
Jh(t) 132, 287 ach 136
k11(x, a) 410 as 36
k< 0>(x,a) 386 as+ 39
k(x,a) 388 (¢, m)t 273
A 29, 127 1/Jh(.) 75, 130
A(x) 29, 128 ;j;h(.) 307
.c 16 q(·) 29, 127
.co. 21, 106 qh(·) 129
.Cu(·) 21, 34
Qh(x, a) 104, 107, 109
.Cw 252, 274 r(x) 24, 133, 444
.C* 340 r(x,y) 37
m(·) 86 r(x,yia) 51
mh(-) 287 rh(x, yia) 145
mh·6(-) 381 R(u) 155
mt(·) 86 Rh(u) 145
M~ 131 'R(U x [0, oo)) 263
Mh(t) 132 mh 1o6
J..l.h(x,u) 203 Pn 28, 127
J..l.'1 24 s 36
Nh 73 So 43
Ny(A) 414 ISI37
N(·) 30, 127 sh 11
N(t,H) 127 Sh(-) 449
Nh(·) 254 0"~ 293
Nm(·) 360 Eb(T) 11
Nh(t,H) 289 Eb 11
N(-) 270 Eb 11
Vn 29, 127 thn 72
v~ 130 ft,.S 335
n
p(x,y) 36 trB 14
p(x, yia) 48 TJu 164
ph(x, yia) 71 Tas,u 165
ph,.S (x, yia) 328
Index 475

T/; 179 xa(·) 290


fh(t) 307 ~n 36
r 54 ~~ 70
'f 259 ~h(·) 72
rh 73 YoU 290
r~ 75, 137 z 0 (-) 290
f(·) 259, 278 (~· 6 332
f~ 307 ,h,o 332
n,O
uh 71 (~· 6 332
n
u 19 (h,o(t) 335
uh,o 372 (~· 6 335
wh(-) 256, 288 '* 246
Applications of Mathematics
(continued from page ii)

33 Embrechts/Kluppelberg/Mikosch, Modelling Extremal Events (1997)


34 Duflo, Random Iterative Models (1997)
35 Kushner/Yin, Stochastic Approximation Algorithms and Applications (1997)
36 Musiela/Rutkowski, Martingale Methods in Financial Modeling: Theory and
Application (1997)
37 Yin/Zhang, Continuous-Time Markov Chains and Applications (1998)
38 Dembo/Zeitouni, Large Deviations Techniques and Applications, Second Ed.
(1998)
39 Karatzas/Shreve, Methods of Mathematical Finance (1998)
40 Fayolle/lasnogorodski/Malyshev, Random Walks in the Quarter Plane (1999)
41 Aven/Jensen, Stochastic Models in Reliability (1999)
42 Hermln.dez-Lerrna!Lasserre, Further Topics on Discrete-Time Markov Control
Processes (1999)
43 Yong/Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations (1999)
44 Serfozo, Introduction to Stochastic Networks (1999)
45 Steele, Stochastic Calculus and Financial Applications (2000)

You might also like