DPOCexam2017 Solution BB
DPOCexam2017 Solution BB
Solutions
Number of Problems: 4
where
1 2 0
1= , R= .
1 0 1
Furthermore, the state xk ∈ R and the control input uk ∈ R2 . The cost function is given
by
X 2
xk .
k=0
yk+1 = ξk ,
where ξk is a random variable taking value i ∈ {1, 2, ..., m} with probability pξk (i). Given
yk−2 , wk is independent of all variables before time k; ξk is independent of all variables
before time k.
Convert the above problem into the standard problem formulation of dynamic program-
ming. In particular, write down the state vector, the system dynamics, and the distur-
bance vector with its probability density function (PDF) expressed as a function of the
given PDFs. The dimension of the state space should be as small as possible. You do
not have to solve the dynamic programming problem [4 points]
c) Consider a discrete random variable x which is defined by the set of all its possible outcomes
X with X = {1, 2, 3}, and a PDF px . Each element of X can only occur with probability
0, 0.5, or 1. The objective is to find the PDF that attains the minimum expected value of
x:
minimize E [x] (1)
px ∈P x
Pk
where P is the set of all possible PDFs of x. Define the state yk as i=1 px (i) for all k ≥ 1.
[11 points]
i) Formulate an equivalent problem that matches the standard form to which the dy-
namic programming algorithm can directly be applied, that is, explicitly state the
Final Exam – Dynamic Programming & Optimal Control Page 3
d) Consider a rectangular box with side lengths l > 0, w > 0, and h > 0, as shown in Fig.
1. The problem is to determine the side lengths that maximize the volume of the box,
subject to the constraint l + w + h = 1. [10 points]
l
w
i) Formulate an equivalent problem that matches the standard form to which the dy-
namic programming algorithm1 can directly be applied, that is, explicitly state the
• dynamics fk (xk , uk ) such that xk+1 = fk (xk , uk ), initial condition x0 , and what
xk and uk correspond to in the original problem.
• number of stages N such that k = 0, ..., N − 1.
• state-space Sk such that xk ∈ Sk .
• control-space Uk (xk ) such that uk ∈ Uk (xk ).
• stage costs gk (xk , uk ) and terminal cost gN (xN ), such that the total cost is
PN −1
k=0 gk(xk , uk ) + gN (xN ).
You do not have to solve the dynamic programming problem
ii) Is it possible to convert the above problem to a deterministic shortest path problem?
If so, draw the corresponding graph. In particular, draw all the vertices including
the starting node and the terminal node, and all the edges with the associated arc
lengths. If not, explain why.
1
You may replace min with max, and correspondingly the gk (·) are rewards.
Page 4 Final Exam – Dynamic Programming & Optimal Control
Solution 1
a) k=N =2:
J2 (x2 ) = x2
k=1:
Furthermore,
∂ 2 J1 (x1 )
= 2R
∂u21
Its eigenvalues are 4 and 2, which are positive, and thus the matrix is positive definite.
The sufficient condition for optimality is therefore satisfied.
b) • Let sk := yk−2 , rk := yk−1 , then the augmented state vector x̃k := (xk , yk , rk , sk ).
Since the forecasts yk−2 , yk−1 , yk are known at time k, we still have perfect state
information.
• We define our new disturbance as w̃k := (wk , ξk ), with probability distribution
Note that wk depends only on x̃k (in particular sk ), and ξk does not depend on
anything.
• The dynamics therefore become
xk+1 fk (xk , uk , wk )
yk+1 ξk =: f˜k (x̃k , uk , w̃k ) ,
rk+1 =
x̃k+1 =
yk
sk+1 rk
c) i) • Stage index: there are 3 elements in the outcome set X . There are 3 stages.
State: yk = ki=1 px (i) for k = 1, 2, 3, y0 = 0.
P
•
• State space: S0 = {0}; Sk = {0, 21 , 1}, k = 1, 2; S3 = {1}.
• Control input: uk = px (k + 1), j = 0, 1, 2.
• Dynamics: yk+1 = yk + uk .
Final Exam – Dynamic Programming & Optimal Control Page 5
ii) Yes.
0 (2, 0)
(1, 0)
1 3
0
0.5 0 1.5 0
(0, 0) (1, 0.5) (2, 0.5) (3, 1) T
1
1 2
(1, 1) 0
(2, 1)
Figure 2
Page 6 Final Exam – Dynamic Programming & Optimal Control
maximize lwh
s.t. l + w + h = 1
l, w, h > 0
This is equivalent to
since ln(·) is a monotonic increasing function. We have seen its use in the Viterbi Algo-
rithm.
xk+1 = xk + uk
For problems marked with *: Answers left blank are worth 0 points. Each wrong answer is
worth -1 point. You do not have to explain your answer. Each correct answer is worth 1 point.
The minimum score of Problem 2 is 0.
a) True or False questions. You do not have to explain your answer. [3 points]
i)* In dynamic programming, every finite state problem can be converted to a determin-
istic shortest path problem.
ii)* In the Viterbi algorithm, we are given a measurement sequence ZN = (z1 , . . . , zN ),
and we want to find the “most likely” state trajectory XN = (x0 , . . . , xN ). In partic-
ular, we solve for a maximum a-posteriori estimate X̂N := (x̂0 , . . . , x̂N ) where
Let Zk := (z1 , . . . , zk ) and Xk := (x1 , . . . , xk ) for some time k < N . The estimate X̂k
that maximizes p(Xk |Zk ) can always (in theory) be computed at the end of time k.
iii)* Consider any deterministic shortest path problem, and −C is the smallest arc length,
where C is positive. If we add C to every arc length such that now the smallest arc
length has length 0, and is thus non-negative, such that the smallest arc length is 0,
then we can always apply the label correcting algorithm to find the shortest path.
b) Suppose the label correcting algorithm was applied to a shortest path problem, producing
the following table:
Table 1
Answer the following questions pertaining to the above table. You do not have to explain
your answer. [10 points]
Page 10 Final Exam – Dynamic Programming & Optimal Control
Solution 2
a) i) False
ii) True
iii) False
• (S, 2, 6, 7, 8)
• (S, 3, 4, 5, 7, 8)
• (S, 3, 6, 7, 8)
• (S, 2, 5, 7, 8).
iii) True.
iv) False.
v) False.
vi) There is insufficient data to determine this.
vii) There is insufficient data to determine this.
viii) c6,7 = 5.
Page 12 Final Exam – Dynamic Programming & Optimal Control
For problems marked with *: Each correct answer is worth 1 point. Answers left blank are worth
0 points. Each wrong answer is worth -1 point. You do not have to explain your answer. The
minimum score of Problem 3 is 0.
a) True of False questions. You do not have to explain your answer. [5 points]
i)* In stochastic shortest path problems, the value iteration algorithm always converges
after a finite number of iterations.
ii)* In stochastic shortest path problems, the value iteration algorithm involves solving a
system of linear equations.
iii)* In stochastic shortest path problems, the policy iteration algorithm in discounted
problems can be initialized with an arbitrary admissible policy.
iv)* In stochastic shortest path problems, the policy iteration algorithm involves solving
a system of linear equations.
v)* In stochastic shortest path problems, let the state space be S = {0, 1, ..., n} with the
termination state 0, and Pµ ∈ Rn×n be the probability transition matrix associated
with a policy µ, whose (i, j)th entry is Pij (µ(i)) with i, j ∈ S\{0}. The invertibility
of the matrix (I − Pµ ) for the policy µ is equivalent to the properness of that policy.
b) You are implementing the policy iteration algorithm for a stochastic shortest path problem
on the computer. You printed out the cost vectors you solved at each iteration and the
cost vectors at the second and third iteration are shown in Table 2:
Table 2
c) Consider the stochastic shortest path problem represented in Figure 3, where at state
i ∈ {0, 1, 3}, the control action u can either be A or B, and at state i = 2, the control
action u can only be A. [10 points]
iv) For the policy in part i), construct the transition probability matrix Pµ ∈ R3×3 , whose
(i, j)th entry is Pij (µ(i)) with i, j ∈ {1, 2, 3}. Is this matrix invertible or not?
Final Exam – Dynamic Programming & Optimal Control Page 13
p=1
p=1 g=1 p=1 p=1
0 1 0 1
g=0 g=0 g=1
p = 0.5
g=1 p = 0.5
g=1
p = 0.5
3 p=1 g=1 3 p = 0.5
2 2 g=1
g=1
(a) u = A (b) u = B
Figure 3: Probability transition graph, with the associated probabilities p, and stage costs g,
denoted on each arc, that is,
P00 (A) = 1
P00 (B) = 1
P10 (A) = 1
P11 (B) = 1
P21 (A) = 0.5
P32 (B) = 0.5
P20 (A) = 0.5
P33 (B) = 0.5
P33 (A) = 1
g(i, B, j) = 1 ∀i ∈ {1, 3}, ∀j
g(i, A, j) = 1 ∀i 6= 0, ∀j
g(0, B, 0) = 0
g(0, A, 0) = 0
v) The optimal cost vector to the above stochastic shortest path problem can be obtained
by solving a linear program of the generic form
minimize f >V
V
subject to M V ≤ h
where V , f and h are vectors, and M is a matrix. Write down a choice for f , h, and
M such that the optimal cost vector is obtained by solving the above linear program.
Page 14 Final Exam – Dynamic Programming & Optimal Control
Solution 3
a) i) False
ii) False
iii) True
iv) True
v) True
b) The implementation is definitely wrong, since the cost associated with state 3 increases
after iteration 2. In policy iteration, the cost stays the same or decreases for any state
after each iteration.
c) i) True
ii) False
iii) False
iv)
0 0 0
Pµ = 0.5 0 0
0 0.5 0.5
No, it is not.2
v) The optimization problem for the stochastic shortest path problem has the form
3
X
minimize fi V (i)
V
i=1
3
X
subject to V (i) ≤ q(i, u) + Pij (u)V (j) , ∀u ∈ U(i), ∀i ∈ S\{0}
j=1
V (1) ≤ q(1, A)
V (1) ≤ q(1, B) + V (1)
V (2) ≤ q(2, A) + 0.5V (1)
V (3) ≤ q(3, A) + V (3)
V (3) ≤ q(3, B) + 0.5V (2) + 0.5V (3)
Since every stage cost is 1, the expected stage cost q(i, u) is equal to 1 for all i ∈ S
and u ∈ U(i). Hence we get the inequalities in matrix form:
1 0 0 1
0 0 0 1
−0.5 1 0 V ≤ 1
0 0 0 1
0 −0.5 0.5 1
2
Note that (I − Pµ ) is invertible.
Final Exam – Dynamic Programming & Optimal Control Page 15
therefore
1 0 0 1
0 0 0 1
M = −0.5
1 0 , 1 ,
h=
0 0 0 1
0 −0.5 0.5 1
and
−1
f = −1 .
−1
Page 16 Final Exam – Dynamic Programming & Optimal Control
Final Exam – Dynamic Programming & Optimal Control Page 17
ẋ(t) = cos(θ(t))
ẏ(t) = sin(θ(t))
θ̇(t) = u
where (x(t), y(t)) is the vehicle’s position on the plane at time t, θ(t) is its heading (see Fig. 4),
and u(t) ∈ [−1, 1], for all t, is the control input.
θ
y
The vehicle starts off at position (0, 0) with a heading of 0 at t = 0. The objective is to determine
the time-optimal trajectory that transfers the vehicle to position (0, 3).
a) Compute Pontryagin’s necessary conditions for optimality, including any singular arc con-
ditions. [9 points]
b) In a couple of sentences, motivate why the optimal trajectory must end in a singular arc,
and that the optimal u(0) = 1. [1 point]
c) Compute the optimal state and input trajectories, and the optimal terminal time T , using
the hints from part b). Show that your solution satisfies the conditions from part a). [12
points]
Solution 4
a) Let x1 := x, x2 := y, x3 := θ.
Boundary conditions:
x(0) = (0, 0, 0)
x(T ) = (0, 3, free)
p3 (T ) = 0
H(x(t), u(t), p(t)) = 1 + p1 (t) cos(x3 (t)) + p2 (t) sin(x3 (t)) + p3 (t)u(t) = 0 ∀t (3)
Optimal input:
where the last case corresponds to a potential singular arc. Check if p3 (t) = 0 can occur
non-trivially:
ṗ3 (t) = 0
⇔ c1 sin(x3 (t)) = c2 cos(x3 (t))
⇔ x3 (t) = arctan 2 (c2 , c1 )
⇒ u(t) = 0
Final Exam – Dynamic Programming & Optimal Control Page 19
b) Note that p3 (T ) = 0, so we can end with a singular arc. The intuition is this: apply
u(t) = 1 until the vehicle heading faces the target position at some time t = t̃, at which
point we switch to the singular arc, stop turning, and drive into the target position. Any
other solution would take more time.
x3 (t) = c3
ẋ1 = cos(c3 ) ⇒ x1 (t) = (t − T ) cos(c3 )
ẋ2 = sin(c3 ) ⇒ x2 (t) = (t − T ) sin(c3 ) + 3
p3 (t) = 0.
u(t) = 0 (4)
H(x(t), u(t), p(t)) = 1 + c1 cos(c3 ) + c2 sin(c3 ) = 0 (5)
u(t) = 1
x3 (t) = t
x1 (t) = sin(t)
x2 (t) = 1 − cos(t)
2π
1 + 2 cos(t̃) = 0 ⇒ t̃ = .
3
2π √
⇒T = + 3.
3
√
3
and thus c1 = 21 , c2 = − 2 .
From (2) and (9):
Z √
1 3
p3 (t) = sin(t) + cos(t)dt + c4
2 2
√
1 3
= − cos(t) + sin(t) + c4
2 2
2π
p3 = 0 ⇒ c4 = −1
3