0% found this document useful (0 votes)
15 views

Week 9 - Probabilistic Dynamic Programming

The document discusses the principles of probabilistic dynamic programming, contrasting it with deterministic dynamic programming, and provides examples such as uncertain demand for milk allocation and a probabilistic inventory model. It emphasizes the importance of state and decision probabilities in determining optimal strategies for resource allocation and cost minimization. Additionally, it includes a scenario involving a bank line to illustrate how dynamic programming can be applied to maximize expected net rewards.

Uploaded by

razugugoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Week 9 - Probabilistic Dynamic Programming

The document discusses the principles of probabilistic dynamic programming, contrasting it with deterministic dynamic programming, and provides examples such as uncertain demand for milk allocation and a probabilistic inventory model. It emphasizes the importance of state and decision probabilities in determining optimal strategies for resource allocation and cost minimization. Additionally, it includes a scenario involving a bank line to illustrate how dynamic programming can be applied to maximize expected net rewards.

Uploaded by

razugugoi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

EN: All rights pertaining to online course contents available on this

platform are reserved and protected by copyrights of the respective


course. Online contents on this platform cannot be used for any other
purposes, unless authorized in writing by the course instructor(s). The
contents may not be copied, altered, distributed, transmitted, or used
for miscellaneous purposes, in any way, in whole or in part.

TR: Bu platform üzerinden erişilen içeriklerin tüm hakları saklıdır ve


ilgili derse ait telif hakları korunmaktadır. Dersi veren akademisyenlerin
yazılı izni olmadan içerikler herhangi bir şekilde kullanılamaz. Herhangi
bir içerik kısmen ya da tamamen kopyalanamaz, değiştirilemez,
dağıtılamaz, iletilemez ya da sair ticari amaçla kullanılamaz.

1
IE301

Operations Research II

Week 9 – Probabilistic Dynamic Programming

2
Probabilistic Dynamic Programming
• In deterministic dynamic programming, a
specification of the current state and current
decision was enough to tell us with certainty the new
state and the costs/rewards during the current stage.
• Different from deterministic dynamic programming,
in probabilistic dynamic programming, the state at
the next stage is not completely determined by the
state and policy decision at the current stage. There
is a probability distribution for what the next state
will be.

3
Basic Structure of Probabilistic
Dynamic Programming
Contribution Stage n+1
Stage n Probability from stage n
1 fn+1(1)
C1

p1
C2 2 fn+1(2)
Decision p2
sn xn
pS
fn(sn,xn)
CS
S fn+1(S)

• Called a decision tree if expanded to include all the possible


states and decisions at all the stages 4
Recursive Relationship
• Assume that the objective is to minimize the expected
sum of the contributions from the individual stages.
• fn(sn) denotes the minimum expected sum from stage n
till the end given that the state at stage n is sn.
• fn(sn,xn) denotes the expected sum from stage n till the
end given that the state and policy decision at stage n
are sn and xn, respectively.
• Then, S
f n ( sn , xn )   pi Ci  f n 1 (i )
i 1

f n ( sn )  min f n ( sn , xn )
xn 5
Example: Uncertain Demand
• For the price of $1/gallon, the Safeco
Supermarket chain has purchased 6 gallons of
milk from a local dairy.
• Each gallon of milk is sold in the chain’s three
stores for $2/gallon.
• The dairy must buy back at 50¢/gallon any milk
that is left at the end of the day.
• Unfortunately for Safeco, demand for each of the
chain’s three stores is uncertain.
• Safeco wants to allocate the 6 gallons of milk to
the three stores so as to maximize the expected
net daily profit earned from milk.
6
Example: Uncertain Demand
• Use dynamic programming to determine how
Safeco should allocate the 6 gallons of milk
among the three stores.
Demand Probability
Store 1 1 0.6
2 0
3 0.4
Store 2 1 0.5
2 0.1
3 0.4
Store 3 1 0.4
2 0.3
3 0.3
7
Example: Uncertain Demand
• With the exception of the fact that the
demand is uncertain, this problem is very
similar to the resource allocation problem.
– Stage: Each store
– State: The amount to be allocated to the
remaining stores
– Decision: Amount allocated to the current store
• Observe that since Safeco’s daily purchase
costs are always $6, we may concentrate our
attention on the problem of allocating the
milk to maximize daily revenue earned from
the 6 gallons.
8
Example: Uncertain Demand
• Define
– rt(gt) = expected revenue earned from gt
gallons assigned to store t
– ft(x) = maximum expected revenue earned
from x gallons assigned to stores t, t+1,…,3
• For t = 1, 2, we may write
f t ( x)  max{rt ( g t )  f t 1 ( x  g t )}
gt

were gt must be a member of {0,1,…,x}.


9
Example: Uncertain Demand
• We begin computing the rt(gt)’s.
0.3 prob: 3 unit demand
r3(0)=0 0.3 prob: 2 unit demand
So: 0.6 prob will sell 2 units
r3(1)=0.4*2+0.3*2+0.3*2=2 0.4 prob: will sell 1unit &
r3(2)=(0.3+0.3)*2*2+0.4*(2+0.5)=3.40 salvage 1 unit
r3(3)=0.3*6+0.3*(4+0.5)+0.4*(2+2*0.5)=4.35
Stage 3:
Let gt(x) be an allocation to store t that attains ft(x)
f3(0)=r3(0)=0 and g3(0)=0
f3(1)=r3(1)=2 and g3(1)=1
f3(2)=r3(2)=3.40 and g3(2)=2
f3(3)=r3(3)=4.35 and g3(3)=3

10
Example: Uncertain Demand
Stage 2:

11
Example: Uncertain Demand
Stage 1:

Optimal Solution:
Allocate 1 gallon to store 1, then from f2(5) allocate 3
gallons to store 2, and 2 gallons to store 3

12
Example: A Probabilistic Inventory Model
• We modify the inventory model discussed
previously to allow for uncertain demand.
• This will illustrate the difficulties involved in
solving a PDP for which the state during the
next period is uncertain.

13
Example: A Probabilistic Inventory Model

• Each period’s demand is equally likely to be 1


or 2 units.
• After meeting the current period’s demand
out of current production and inventory, the
firm’s end-of-period inventory is evaluated,
and a holding cost of $1 per unit is assessed.
• Because of limited capacity, the inventory at
the end of each period cannot exceed 3 units.
• It is required that all demand be met on time.
14
Example: A Probabilistic Inventory Model

• Any inventory on hand at the end of


period 3 can be sold at $2 per unit.
• Cost of production is 2x+3 where x is
number of units produced.
• At the beginning of period 1, the firm has
1 unit of inventory. Use dynamic
programming to determine a production
policy that minimizes the expected net
cost incurred during the three periods.
15
Example: A Probabilistic Inventory Model
• Define ft(i) to be the minimum expected net
cost incurred during the periods t, t +1,…,3
when the inventory at the beginning of period
t is i units. Then,
Holding cost Salvage value

 1 1 1 1 


f 3 (i)  min c( x)   i  x  1   i  x  2   2i  x  1   2(i  x  2)
x
 2 2 2 2 

where x must be a member of {0,1,2,3,4} and


x must satisfy (2 - i) ≤ x ≤ (4 - i).
16
Example: A Probabilistic Inventory Model
• For t = 1, 2, we can derive the recursive
relation for ft(i) by noting that for any month
t and production level x, the expected costs
incurred during periods t, t+1, …,3 are the
sum of the expected costs incurred during
period t and the expected costs incurred
during periods t+1, t+2, …,3 .
• As before, if x units are produced during
month t, the expected cost during month t
will be
c(x) + (½) (i+x-1)+ (½)(i+x-2)
17
Example: A Probabilistic Inventory Model
• If x units are produced during month t,
the expected cost during periods t+1,
t+2, …,3 is computed as follows.
• Half of the time, the demand during
period t will be 1 unit, and the inventory
at the beginning of t+1 will be i + x – 1.
• Then, the expected costs incurred
during periods t+1, t+2, …,3 is ft+1(i+x-
1).
18
Example: A Probabilistic Inventory Model
• Similarly, there is a ½ chance that the
inventory at the beginning of period t+1 will
be i + x – 2.
• In this case, the expected cost incurred
during periods t+1, t+2, …,3 will be ft+1(i+x-
2).
• In summary, the expected cost during the
periods t+1, t+2, …,3 will be

(½) ft+1(i+x-1) + (½) ft+1(i+x-2)


19
Example: A Probabilistic Inventory Model
• With this in mind, we may write for t = 1,2

 1 1 1 1 


f t (i)  min c( x)   i  x  1   i  x  2    f t 1 i  x  1    f t 1 (i  x  2)
x
 2 2 2 2 

where x must be a member of {0,1, 2, 3, 4}


and x must satisfy (2-i) ≤ x ≤ (4-i).

20
Example: A Probabilistic Inventory Model

• We define xt(i) to be a period t production


level attaining the minimum for ft(i).
• We now work backward until f1(1) is
determined
• Since each period’s ending inventory must be
nonnegative and cannot exceed 3 units, the
state during each period must be 0,1,2 or 3.

21
A Probabilistic Inventory Model: f3(i)

22
A Probabilistic Inventory Model: f2(i)

23
A Probabilistic Inventory Model: f1(i)

Optimal Policy:
At stage 1, produce 3, then decide based on realized
demand

24
Further Examples of Probabilistic
Dynamic Programming Formulations
• Many probabilistic dynamic programming problems
can be solved using recursions of the following form
(for max problems):
 
f t (i )  max (expectedreward in stage t | i, a)   p( j | i, a, t ) f t 1 ( j )
a
 j 
• ft(i): max expected reward that can be earned during
stages t, t+1,…, given the state at the beginning of
stage t is i
• p(j|i,a,t): probability that the next stage’s state will be
j, given that current state is i (at stage t) and action a
is chosen

25
Another Example
• When Sally Mutton arrives at the bank,
30 minutes remain in her lunch break.
• If Sally makes it to the head of the line
and enters service before the end of her
lunch break, she earns reward r.
• However, Sally does not enjoy waiting in
lines, so to reflect her dislike for waiting
in line, she incurs a cost of c for each
minute she waits.
26
Another Example
• During a minute in which n people are ahead of
Sally, there is a probability p(x|n) that x people
will complete their transactions.
• Suppose that when Sally arrives, 20 people are
ahead of her in line.
• Use dynamic programming to determine a
strategy for Sally that will maximize her
expected net revenue (reward-waiting costs).

27
Solution
• When Sally arrives at the bank, she must decide
whether to join the line or give up and leave.
• At any later time, she may also decide to leave if
it is unlikely that she will be served by the end
of her lunch break.
• We can work backward to solve the problem.
• ft(n): the maximum expected net reward that
Sally can receive from time t to the end of her
lunch break if at time t, n people are ahead of
her.

28
Solution
• Let t=0 be the current time and t=30 be the
end of the problem.
• Since t=29 is the beginning of the last minute
of the problem, we write
0 (Leave)
f 29 (n)  max 
rp(n | n)  c (Stay)

• If Sally chooses to leave at time 29, she earns


no reward and incurs no more costs.
29
Solution
0 (Leave)
f 29 (n)  max 
rp(n | n)  c (Stay)

• If she stays at time 29, she will incur a waiting


cost of c (a revenue of –c) and with probability
p(n|n) will enter service and receive a reward
r.
• Thus, if Sally stays, her expected net reward is
rp(n|n)-c
30
Solution
• For t<29,
0 (Leave)

f t (n)  max rp (n | n)  c 



k n
p (k | n) f t 1 (n  k ) (Stay)

• If Sally stays, she will earn an expected reward


of rp(n|n)-c during the current minute, and
with probability p(k|n), there will be n-k
people ahead of her; in this case, her
expected net reward from time t+1 to time 30
will be ft+1(n-k).
31
Solution
• Hence, if Sally stays, her overall expected
reward received from time t+1, t+2,…,30 will
be
 p ( k | n) f
k n
t 1 (n  k )

• And, we have the following formula


0 (Leave)

f t (n)  max rp (n | n)  c  p(k | n) f (n  k ) (Stay)
 
k n
t 1

32
Solution
• To determine Sally’s optimal waiting policy, we
work backward until f0(20) is computed.
• Sally stays until the optimal action is “leave” or
she begins to be served.
• Problems in which the decision maker can
terminate the problem by choosing a
particular action are known as stopping rule
problems; they often have a special structure
that simplifies the determination of optimal
policies.
33
Example: Determining Reject Allowances
• The HIT-AND-MISS MANUFACTURING COMPANY has
received an order to supply one item of a particular
type. However, the customer has specified such
stringent quality requirements that the manufacturer
may have to produce more than one item to obtain
an item that is acceptable. The number of extra
items produced in a production run is called the
reject allowance.
• The manufacturer estimates that each item of this
type that is produced will be acceptable with
probability 0.5 and defective (without possibility for
rework) with probability 0.5.
34
Example: Determining Reject Allowances
• Marginal production costs for this product are
estimated to be $100 per item (even if defective),
and excess items are worthless. In addition, a setup
cost of $300 must be incurred whenever the
production process is set up for this product.
• The manufacturer has time to make no more than
three production runs. If an acceptable item has not
been obtained by the end of the third production
run, the cost to the manufacturer in lost sales
income and penalty costs will be $1,600.

35
Example: Determining Reject Allowances
• The objective is to determine the policy
regarding the lot size (1 + reject allowance) for
the required production run(s) that minimizes
total expected cost for the manufacturer.

36
Example: Determining Reject Allowances
• Stage:
– Each production run (n = 1, 2, 3)
• Decision:
– How much to produce (lot size) in each stage (xn)
• State:
– Number of acceptable items still needed (1 or 0)
at beginning of stage n (sn)

37
Example: Determining Reject Allowances
• Notation:
– fn(sn, xn): total expected cost for stages n, …,
3 if system starts in state sn at stage n and
lot size is xn
– fn*(sn): the minimum expected cost for
stages n, …, 3 if system starts in state sn
• fn*(sn) = min xn fn(sn, xn)
• Note that fn*(0) = 0.

38
Example: Determining Reject Allowances
• Assume that the numbers in the following
calculations are in hundred dollars.
• Contribution to cost (actually production cost)
from stage n is K(xn)+xn
– K(xn) = 3 if xn>0; 0, otherwise.

39
Example: Determining Reject Allowances
• For sn=1,

• Note that f4*(1)=16

40
Example: Determining Reject Allowances
• Hence, the recursive relationship is as follows:

• Work backward to calculate f1*(1)

41
Example: Determining Reject Allowances
• For n=3,

42
Example: Determining Reject Allowances
• For n=2,

43
Example: Determining Reject Allowances
• For n=1,

44
Example: Determining Reject Allowances
• Optimal Solution:
– Produce two items on the first production run
– If none is acceptable, then produce either two or
three items on the second production run
– If none is acceptable, then produce either three or
four items on the third production run.
– Total expected cost is $675.

45

You might also like