Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision
Variables. These Variables Provide Information For Analyzing The Possible Effects That The Current Decision
22.1 INTRODUCTION
The decision-making process often involves several decisions to be taken at different times. For example,
problems of inventory control, evaluation of investment opportunities, long-term corporate planning, and
so on, require sequential decision-making. The mathematical technique of optimizing a sequence of
interrelated decisions over a period of time is called dynamic programming. The dynamic programming
approach uses the idea of recursion to solve a complex problem, broken into a series of interrelated
(sequential) decision stages (also called subproblems) where the outcome of a decision at one stage affects
the decision at each of the following stages. The word dynamic has been used because time is explicitly
taken into consideration.
Dynamic programming (DP) differs from linear programming in two ways:
(i) In DP, there is no set procedure (algorithm) as in LP to solve any decision-problem. The DP
technique allows to break the given problem into a sequence of smaller subproblems, which are
then solved in a sequential order (stage).
(ii) LP approach provides one-time period (single stage) solution to a problem whereas DP approach Dynamic
is useful for decision-making over time and solves each subproblem optimally. programming is
the mathematical
technique of
22.2 DYNAMIC PROGRAMMING TERMINOLOGY optimizing a
sequence of
Regardless of the type or size of a decision problem, there are certain terms and concepts that are commonly interrelated
used in dynamic programming approach of solving such problems. decisions over a
period of time.
Stage The dynamic programming problem can be decomposed or divided into a sequence of smaller sub-
problems called stages. At each stage there are a number of decision alternatives (courses of action) and
a decision is made by selecting the most suitable alternative. Stages represent different time periods in the
planning period. For example, in the replacement problem each year is a stage, in the salesman allocation
problem each territory represents a stage.
State Each stage in a dynamic programming problem is assoicated with a certain number of states. These
states represent various conditions of the decision process at a stage. The variables that specify the
condition of the decision process or describe the status of the system at a particular stage are called state
variables. These variables provide information for analyzing the possible effects that the current decision
could have upon future courses of action. At any stage of the decision-making process there could be a
finite or infinite number of states. For example, a specific city is referred to as state variable, in any stage
of the shortest route problem.
Return function At each stage, a decision is made that can affect the state of the system at the next
stage and help in arriving at the optimal solution at the current stage. Every decision that is made has its Stages are the
own worth or benefit associated and can be described in an algebraic equation form, called a return sequence of smaller
function. This return function, in general, depends upon the state variable as well as the decision made sub-problems of the
main problem to be
at a particular stage. An optimal policy or decision at a stage yields optimal (maximum or minimum) return
analyzed.
for a given value of the state variable.
Figure 22.1 depicts the decision alternatives known at each stage for their evaluation. The range of
such decision alternatives and their associated returns at a particular stage is a function of the state input
to the stage itself. The state input to a stage is the output from the previous (larger number) stage and
the previous stage output is a function of the state input to itself, and the decision taken at that stage.
Thus, to evaluate any stage we need to know the values of the state input to it (there may be more than
one state inputs to a stage) and the decision alternatives and their associated returns at the stage.
Fig. 22.1
Information Flow
between Stages
748 Operations Research: Theory and Applications
For a multistage decision process, functional relationship among state, stage and decision may be
described as shown in Fig. 22.2.
Fig. 22.2
Functional
Relationship
among
Components of a
DP Model
working backwards towards the first stage, making optimal decisions at each stage of the problem. In the
forward process is used to solve a problem by first solving the initial stage of the problem and working
towards the last stage, making an optimal decision at each stage of the problem.
The exact recursion relationship depends on the nature of the problem to be solved by dynamic
programming. The one stage return is given by:
f1 = r1 (s1 , d1)
and the optimal value of f1 under the state variable s1 can be obtained by selecting a suitable decision
variable d1. That is, A particular
f * ( s ) = Opt r ( s , d )
1 1
d1
l 1 1 1 q sequence of
alternatives adopted
by the decision-
The range of d1 is determined by s1, but s1 is determined by what has happened in Stage 2. Then in Stage 2 maker in a
the return function would take the form: multistage decision
problem is called a
{ }
f 2* ( s 2 ) = Opt r2 ( s 2 ) * f1* ( s1 ) ; s1 = t 2 ( s 2 , d 2 )
d2
policy.
By continuing the above logic recursively for a general n stage problem, we have
dn
{ }
f n* ( sn ) = Opt rn ( s n , d n ) * f n*−1 ( s n − 1 ) ; sn – 1 = tn (sn, dn)
The symbol * denotes any mathematical relationship between sn and dn, including addition, subtraction,
multiplication.
The General Procedure
The procedure for solving a problem by using the dynamic programming approach can be summarized in
the following steps:
Step 1: Identify the problem decision variables and specify the objective function to be optimized under
certain limitations, if any.
Step 2: Decompose (or divide) the given problem into a number of smaller sub-problems (or stages).
Identify the state variables at each stage and write down the transformation function as a function of the
state variable and decision variable at the next stage. The optimal policy
Step 3: Write down a general recursive relationship for computing the optimal policy. Decide whether must be one such
that, regardless of
to follow the forward or the backward method for solving the problem. how a particular
Step 4: Construct appropriate tables to show the required values of the return function at each stage state is reached, all
as shown in Table 22.1. later decisions
proceeding from that
Step 5: Determine the overall optimal policy or decisions and its value at each stage. There may be more state must be
than one such optimal policy. optimal.
Fig. 22.3
Network of
Routes
Solution To solve the problem, we need to define problem stages, decision variables, state variables,
return function and transition function. For this particular problem, the following definitions will be used
to denote various the state and decision variables.
dn = decision variables that define the immediate destinations when there are n(n = 1, 2,
3, 4,) stages to go.
sn = state variables describe a specific city at any stage.
Dsn, dn = distance associated with the state variable, sn, and the decision variable, dn for the
current nth stage.
fn (sn , dn) = minimum total distance for the last n stages, given that salesman is in state sn and
selects dn as immediate destination.
fn* (sn) = optimal path (minimum distance) when the salesman is in state sn with n more stages
to go for reaching the final stage (destination).
We start calculating distances between a pair of cities from destination city 10 (= x1) and work backwards
x 5 → x 4 → x 3 → x 2 → x1 to find the optimal path. The recursion relationship for this problem can be
stated as follows:
n n
dn
{
f * ( s ) = Min D sn , d n n }
+ f * ( d ) ; n = 1, 2 , 3, 4
n −1
We move backward to stage 3. Suppose that the salesman is at state s2 = 5 (node 5). Here he has to
decide whether he should go to either d2 = 8 (node 8) or d2 = 9 (node 9). For this he must evaluate two
sums:
D + f * ( 8 ) = 4 + 7 = 11 (to state s1 = 8)
5, 8 1
D5 , 9 + f1* ( 9 ) = 8 + 9 = 17 (to state s1 = 9)
Dynamic Programming 751
This distance function for travelling from state s2 = 5 is the smallest of these two sums:
||||| f 2 ( s2 ) = Min
d2 = 8, 9
l11, 17q = 11 (to state s 1 = 8)
Similarly, the calculation of distance function for travelling from state s2 = 6 and s2 = 7 can be completed
as follows:
R|
D6 , 8 + f1* ( 8 ) = 3 + 7 = 10
For state, s2 = 6 f 2 ( 6 ) = Min S| *
d2 = 8, 9 D
T
6 , 9 + f1 ( 9 ) = 7 + 9 = 16
= 10 ( to state s 1 = 8)
R| D + f1* ( 8 ) = 8 + 7 = 15
f ( 7 ) = Min S
7, 8
For state, s2 = 7 2
|T D
d2 = 8 , 9
7, 9 + f1* ( 9 ) = 4 + 9 = 13
= 13 ( to state s1 = 9 )
These results are entered into the two-stage table as shown in Table 22.3.
8 9 f 2* ( s 2 ) d2
States, s2 5 11 17 11 8
6 10 16 10 8 Table 22.3
7 15 13 13 9 Stage 3
The results that we obtain by continuing the same process for stages 4 and 5, are shown in Tables 22.4
and 22.5.
States, s3 2 18 20 18 18 5 or 7
3 14 18 17 14 5 Table 22.4
4 17 20 18 17 5 Stage 4
Sequence
R|S 10 8 5 3 1
|T 10 8 5 4 1
Distances
RS7 4 3 6 = 20
T7 4 6 3 = 20
From the above, it is clear that there are two alternative shortest routes for this problem, both having a
minimum distance of 20 kilometres.