Algorithms Lecture 4: Dynamic Programming
Algorithms Lecture 4: Dynamic Programming
Those who cannot remember the past are doomed to repeat it.
— George Santayana, The Life of Reason, Book I: Introduction and Reason in Common Sense (1905)
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington
named Wilson. He was secretary of Defense, and he actually had a pathological fear and hatred of the word
‘research’. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red,
and he would get violent if people used the term ‘research’ in his presence. You can imagine how he felt,
then, about the term ‘mathematical’. The RAND Corporation was employed by the Air Force, and the Air
Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air
Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what
name, could I choose?
— Richard Bellman, on the origin of his term ‘dynamic programming’ (1984)
If we all listened to the professor, we may be all looking for professor jobs.
— Pittsburgh Steeler’s head coach Bill Cowher, responding to
David Romer’s dynamic-programming analysis of football strategy (2003)
4 Dynamic Programming
4.1 Fibonacci Numbers
The Fibonacci numbers Fn , named after Leonardo Fibonacci Pisano1 , the mathematician who pop-
ularized ‘algorism’ in Europe in the 13th century, are defined as follows: F0 = 0, F1 = 1, and
Fn = Fn−1 + Fn−2 for all n ≥ 2. The recursive definition of Fibonacci numbers immediately gives
us a recursive algorithm for computing them:
R EC F IBO(n):
if (n < 2)
return n
else
return R EC F IBO(n − 1) + R EC F IBO(n − 2)
How long does this algorithm take? Except for the recursive calls, the entire algorithm requires
only a constant number of steps: one comparison and possibly one addition. If T (n) represents the
number of recursive calls to R EC F IBO, we have the recurrence
This looks an awful lot like the recurrence for√ Fibonacci numbers! The annihilator method gives us
n
an asymptotic bound of Θ(φ ), where φ = ( 5 + 1)/2 ≈ 1.61803398875, the so-called golden ratio,
is the largest root of the polynomial r2 − r − 1. But it’s fairly easy to prove (hint, hint) the exact
solution T (n) = 2Fn+1 − 1 . In other words, computing Fn using this algorithm takes more than
twice as many steps as just counting to Fn !
Another way to see this is that the R EC F IBO is building a big binary tree of additions, with
nothing but zeros and ones at the leaves. Since the eventual output is Fn , our algorithm must
call R EC R IBO(1) (which returns 1) exactly Fn times. A quick inductive argument implies that
R EC F IBO(0) is called exactly Fn−1 times. Thus, the recursion tree has Fn + Fn−1 = Fn+1 leaves,
and therefore, because it’s a full binary tree, it must have 2Fn+1 − 1 nodes.
1
literally, “Leonardo, son of Bonacci, of Pisa”
1
Algorithms Lecture 4: Dynamic Programming
M EM F IBO(n):
if (n < 2)
return n
else
if F [n] is undefined
F [n] ← M EM F IBO(n − 1) + M EM F IBO(n − 2)
return F [n]
If we actually trace through the recursive calls made by M EM F IBO, we find that the array F [ ]
gets filled from the bottom up: first F [2], then F [3], and so on, up to F [n]. Once we realize this,
we can replace the recursion with a simple for-loop that just fills up the array in that order, instead
of relying on the complicated recursion to do it for us. This gives us our first explicit dynamic
programming algorithm.
I TER F IBO(n):
F [0] ← 0
F [1] ← 1
for i ← 2 to n
F [i] ← F [i − 1] + F [i − 2]
return F [n]
I TER F IBO clearly takes only O(n) time and O(n) space to compute Fn , an exponential speedup
over our original recursive algorithm. We can reduce the space to O(1) by noticing that we never
need more than the last two elements of the array:
I TER F IBO 2(n):
prev ← 1
curr ← 0
for i ← 1 to n
next ← curr + prev
prev ← curr
curr ← next
return curr
(This algorithm uses the non-standard but perfectly consistent base case F−1 = 1.)
But even this isn’t the fastest algorithm for computing Fibonacci numbers. There’s a faster
algorithm defined in terms of matrix multiplication, using the following wonderful fact:
0 1 x y
=
1 1 y x+y
2
“My name is Elmer J. Fudd, millionaire. I own a mansion and a yacht.”
2
Algorithms Lecture 4: Dynamic Programming
In other words, multiplying a two-dimensional vector by the matrix [ 01 11 ] does exactly the same
thing as one iteration of the inner loop of I TER F IBO 2. This might lead us to believe that multiplying
by the matrix n times is the same as iterating the loop n times:
n
0 1 1 Fn−1
= .
1 1 0 Fn
A quick inductive argument proves this. So if we want to compute the nth Fibonacci number, all
we have to do is compute the nth power of the matrix [ 01 11 ].
If we use repeated squaring, computing the nth power of something requires only O(log n)
multiplications. In this case, that means O(log n) 2 × 2 matrix multiplications, but one matrix
multiplications can be done with only a constant number of integer multiplications and additions.
Thus, we can compute Fn in only O(log n) integer arithmetic operations.
This is an exponential speedup over the standard iterative algorithm, which was already an
exponential speedup over our original recursive algorithm. Right?
T (n) = T (n − 1) + T (n − 2) + O(n),
1. Formulate the problem recursively. Write down a formula for the whole problem as a
simple combination of the answers to smaller subproblems.
2. Build solutions to your recurrence from the bottom up. Write an algorithm that starts with
the base cases of your recurrence and works its way up to the final solution by considering
the intermediate subproblems in the correct order. This is usually easier than the first step.
3
Algorithms Lecture 4: Dynamic Programming
Of course, you have to prove that each of these steps is correct. If your recurrence is wrong, or if
you try to build up answers in the wrong order, your algorithm won’t work!
Dynamic programming algorithms store the solutions of intermediate subproblems, often but
not always done with some kind of array or table. One common mistake that lots of students make
is to be distracted by the table (because tables are easy and familiar) and miss the much more
important (and difficult) task of finding a correct recurrence. Dynamic programming isn’t about
filling in tables; it’s about smart recursion. As long as we memoize the correct recurrence, an
explicit table isn’t necessary, but if the recursion is incorrect, nothing works.
4
Algorithms Lecture 4: Dynamic Programming
F O O D
M O N E Y
It’s fairly obvious that you can’t get from FOOD to MONEY in three steps, so their edit distance is
exactly four. Unfortunately, this is not so easy in general. Here’s a longer example, showing that
the distance between ALGORITHM and ALTRUISTIC is at most six. Is this optimal?
A L G O R I T H M
A L T R U I S T I C
To develop a dynamic programming algorithm to compute the edit distance between two strings,
we first need to develop a recursive definition. Let’s say we have an m-character string A and an
n-character string B. Then define E(i, j) to be the edit distance between the first i characters of A
and the first j characters of B. The edit distance between the entire strings A and B is E(m, n).
This gap representation for edit sequences has a crucial “optimal substructure” property. Sup-
pose we have the gap representation for the shortest edit sequence for two strings. If we remove
the last column, the remaining columns must represent the shortest edit sequence for the
remaining substrings. We can easily prove this by contradiction. If the substrings had a shorter
edit sequence, we could just glue the last column back on and get a shorter edit sequence for the
original strings. Once we figure out what should go in the last column, the Recursion Fairy will
magically give us the rest of the optimal gap representation.
There are a couple of obvious base cases. The only way to convert the empty string into a string
of j characters is by performing j insertions, and the only way to convert a string of i characters
into the empty string is with i deletions:
E(i, 0) = i, E(0, j) = j.
If neither string is empty, there are three possibilities for the last column in the shortest edit
sequence:
• Insertion: The last entry in the bottom row is empty. In this case, E(i, j) = E(i − 1, j) + 1.
• Deletion: The last entry in the top row is empty. In this case, E(i, j) = E(i, j − 1) + 1.
• Substitution: Both rows have characters in the last column. If the characters are the same,
we don’t actually have to pay for the substitution, so E(i, j) = E(i − 1, j − 1). If the characters
are different, then E(i, j) = E(i − 1, j − 1) + 1.
To summarize, the edit distance E(i, j) is the smallest of these three possibilities:4
E(i − 1, j) + 1
E(i, j) = min E(i, j − 1) + 1
E(i − 1, j − 1) + A[i] 6= B[j]
If we turned this recurrence directly into a recursive algorithm, we would have the following
double recurrence for the running time:
(
O(1) if n = 0 or m = 0,
T (m, n) =
T (m, n − 1) + T (m − 1, n) + T (n − 1, m − 1) + O(1) otherwise.
4
ˆ ˜
Once again, I’m using Iverson’s bracket notation P to denote the indicator variable for the logical proposition P ,
which has value 1 if P is true and 0 if P is false.
5
Algorithms Lecture 4: Dynamic Programming
I don’t know of a general closed-form solution for this mess, but we can derive an upper bound by
defining a new function
(
O(1) if N = 0,
T 0 (N ) = max T (n, m) =
n+m=N 2T (N − 1) + T (N − 2) + O(1) otherwise.
√
The annihilator method implies that T 0 (N ) = O((1+ 2)√N ). Thus, the running time of our recursive
edit-distance algorithm is at most T 0 (n + m) = O((1 + 2)n+m ).
We can bring the running time of this algorithm down to a polynomial by building an m × n
table of all possible values of E(i, j). We begin by filling in the base cases, the entries in the 0th
row and 0th column, each in constant time. To fill in any other entry, we need to know the values
directly above it, directly to the left, and both above and to the left. If we fill in our table in the
standard way—row by row from top down, each row from left to right—then whenever we reach
an entry in the matrix, the entries it depends on are already available.
Since there are Θ(mn) entries in the table, and each entry takes Θ(1) time once we know its
predecessors, the total running time is Θ(mn). The algorithm uses O(mn) space.
Here’s the resulting table for ALGORITHM → ALTRUISTIC. Bold numbers indicate places where
characters in the two strings are equal. The arrows represent the predecessor(s) that actually define
each entry. Each direction of arrow corresponds to a different edit operation: horizontal=deletion,
vertical=insertion, and diagonal=substitution. Bold diagonal arrows indicate “free” substitutions
of a letter for itself. Any path of arrows from the top left corner to the bottom right corner of
this table represents an optimal edit sequence between the two strings. (There can be many such
paths.) Moreover, since we can compute these arrows in a postprocessing phase from the values
stored in the table, we can reconstruct the actual optimal editing sequence in O(n + m) additional
time.
6
Algorithms Lecture 4: Dynamic Programming
A L G O R I T H M
0 →1→2→3→4→5→6→7→8→ 9
↓&
A 1 0→1→2→3→4→5→6→7→ 8
↓ ↓&&
L 2 1 0→1→2→3→4→5→6→ 7
↓ ↓ ↓& & & & &
T 3 2 1 1→2→3→4→4→5→ 6
↓ ↓ ↓ ↓& & & &
R 4 3 2 2 2 2→3→4→5→ 6
↓ ↓ ↓&↓& ↓ &↓& & & &
U 5 4 3 3 3 3 3→4→5→ 6
& & & &
↓ ↓ ↓&↓& ↓ &↓&
I 6 5 4 4 4 4 3→4→5→ 6
↓ ↓ ↓&↓& ↓ &↓ ↓& & &
S 7 6 5 5 5 5 4 4 5 6
↓ ↓ ↓&↓& ↓ &↓ ↓&& & &
T 8 7 6 6 6 6 5 4→5→ 6
& ↓ ↓& &
↓ ↓ ↓&↓& ↓ &↓&
I 9 8 7 7 7 7 6 5 5→ 6
↓ ↓ ↓&↓& ↓ &↓ ↓ ↓& ↓ &
C 10 9 8 8 8 8 7 6 6 6
The edit distance between ALGORITHM and ALTRUISTIC is indeed six. There are three paths
through this table from the top left to the bottom right, so there are three optimal edit sequences:
A L G O R I T H M
A L T R U I S T I C
A L G O R I T H M
A L T R U I S T I C
A L G O R I T H M
A L T R U I S T I C
7
Algorithms Lecture 4: Dynamic Programming
A simple inductive argument implies that Half(m, n) is the correct value of h. We can easily modify
our earlier algorithm so that it computes Half(m, n) at the same time as the edit distance Edit(m, n),
all in O(mn) time, using only O(m) space.
Now, to compute the optimal editing sequence that transforms A into B, we recursively compute
the optimal subsequences. The recursion bottoms out when one string has only constant length,
in which case we can determine the optimal editing sequence by our old dynamic programming
algorithm. Overall the running time of our recursive algorithm satisfies the following recurrence:
O(n)
if m ≤ 1
T (m, n) = O(m) if n ≤ 1
O(mn) + T (m/2, h) + T (m/2, n − h) otherwise
It’s easy to prove inductively that T (m, n) = O(mn), no matter what the value of h is. Specifically,
the entire algorithm’s running time is at most twice the time for the initial dynamic programming
phase.
A similar inductive argument implies that the algorithm uses only O(n + m) space.
To put this recurrence in more standard form, fix the frequency array f , and let S(i, j) denote the
total search time in the optimal search tree for the subarray A[i .. j]. To simplify notation a bit, let
F (i, j) denote the total frequency count for all the keys in the interval A[i .. j]:
j
X
F (i, j) = f [k]
k=i
The base case might look a little weird, but all it means is that the total cost for searching an empty
set of keys is zero.
8
Algorithms Lecture 4: Dynamic Programming
The algorithm will be somewhat simpler and more efficient if we precompute all possible values
of F (i, j) and store them in an array. Computing each value F (i, j) using a separate for-loop would
O(n3 ) time. A better approach is to turn the recurrence
(
f [i] if i = j
F (i, j) =
F (i, j − 1) + f [j] otherwise
We could also traverse the array row by row from the bottom up, traversing each row from
left to right, or column by column from left to right, traversing each columns from the bottom up.
These two orders give us the following algorithms:
9
Algorithms Lecture 4: Dynamic Programming
O PTIMAL S EARCH T REE 2(f [1 .. n]): O PTIMAL S EARCH T REE 3(f [1 .. n]):
I NIT F(f [1 .. n]) I NIT F(f [1 .. n])
for i ← n downto 1 for j ← 0 to n
S[i, i − 1] ← 0 S[j + 1, j] ← 0
for j ← i to n for i ← j downto 1
C OMPUTE S(i, j) C OMPUTE S(i, j)
return S[1, n] return S[1, n]
No matter which of these three orders we actually use, the resulting algorithm runs in Θ(n3 ) time
and uses Θ(n2 ) space .
We could have predicted this from the original recursive formulation.
0 if j = i − i
S(i, j) =
F (i, j) + min S(i, r − 1) + S(r + 1, j) otherwise
i≤r≤j
First, the function has two arguments, each of which can take on any value between 1 and n, so we
probably need a table of size O(n2 ). Next, there are three variables in the recurrence (i, j, and r),
each of which can take any value between 1 and n, so it should take us O(n3 ) time to fill the table.
In general, you can get an easy estimate of the time and space bounds for any dynamic program-
ming algorithm by looking at the recurrence. The time bound is determined by how many values
all the variables can have, and the space bound is determined by how many values the parameters
of the function can have. For example, the (completely made up) recurrence
k−m
X
F (i, j, k, l, m) = min max F (i − p, j − q, r, l − 1, m − r)
0≤p≤i 0≤q≤j
r=1
10
Algorithms Lecture 4: Dynamic Programming
It’s not hard to see the r increases monotonically from i to n during each iteration of the out-
ermost for loop. Consequently, the innermost for loop iterates at most n times during a single
iteration of the outermost loop, so the total running time of the algorithm is O(n2 ).
If we formulate the problem slightly differently, this algorithm can be improved even further.
Suppose we require the optimum external binary tree, where the keys A[1 .. n] are all stored at the
leaves, and intermediate pivot values are stored at the internal nodes. An algorithm due to Te Ching
Hu and Alan Tucker5 computes the optimal binary search tree in this setting in only O(n log n) time!
There are several different ways to triangulate any convex polygon. Suppose we want to find
the triangulation that requires the least amount of ink to draw, or in other words, the triangulation
where the total perimeter of the triangles is as small as possible. To make things concrete, let’s label
the corners of the polygon from 1 to n, starting at the bottom of the polygon and going clockwise.
We’ll need the following subroutines to compute the perimeter of a triangle joining three corners
using their x- and y-coordinates:
11
Algorithms Lecture 4: Dynamic Programming
2 6
6 2 6 6
2 2 7
7 7 7 7
7
8 1 8
1 8 1 8 1 8 1 8
1
Building on this recursive definition, we can now recursively define the total length of the
minimum-length triangulation. In the best triangulation, if we remove the ‘base’ triangle, what
remains must be the optimal triangulation of the two smaller polygons. So we just have choose the
best triangle to attach to the first and last corners, and let the recursion fairy take care of the rest:
0 if j = i + 1
M (i, j) =
min ∆(i, j, k) + M (i, k) + M (k, j) otherwise
i<k<j
As in the optimal search tree problem, each table entry M [i, j] depends on all the entries directly
to the left or directly below, so we can use any of the orders described earlier to fill the table.
M IN T RIANGULATION:
M IN T RIANGULATION 2: M IN T RIANGULATION 3:
for i ← 1 to n − 1
for i ← n downto 1 for j ← 2 to n
M [i, i + 1] ← 0
M [i, i + 1] ← 0 M [j − 1, j] ← 0
for d ← 2 to n − 1
for j ← i + 2 to n for i ← j − 1 downto 1
for i ← 1 to n − d
C OMPUTE M(i, j) C OMPUTE M(i, j)
C OMPUTE M(i, i + d)
return M [1, n] return M [1, n]
return M [1, n]
In all three cases, the algorithm runs in Θ(n3 ) time and uses Θ(n2 ) space, just as we should have
guessed from the recurrence.
12
Algorithms Lecture 4: Dynamic Programming
A polygon triangulation and the corresponding binary tree. (Squares represent null pointers.)
A third problem that fits into the same mold is the infamous matrix chain multiplication prob-
lem. Using the standard algorithm, we can multiply a p × q matrix by a q × r matrix using O(pqr)
arithmetic operations; the result is a p × r matrix. If we have three matrices to multiply, the cost
depends on which pair we multiply first. For example, suppose A and C are 1000 × 2 matrices and
B is a 2 × 1000 matrix. There are two different ways to compute the threefold product ABC:
• (AB)C: Computing AB takes 1000·2·1000 = 2 000 000 operations and produces a 1000×1000
matrix. Multiplying this matrix by C takes 1000 · 1000 · 2 = 2 000 000 additional operations.
So the total cost of (AB)C is 4 000 000 operations.
Now suppose we are given an array D[0 .. n] as input, indicating that each matrix Mi has D[i−1]
rows and D[i]Q columns. We have an exponential number of possible ways to compute the n-
fold product ni=1 Mi . The following dynamic programming algorithm computes the number of
arithmetic operations for the best possible parenthesization:
13
Algorithms Lecture 4: Dynamic Programming
Observe that there are only nT possible values for the input parameters that lead to the interesting
case of this recurrence, so storing the results of all such subproblems requires O(mn) space . If
S(i + 1, t) and S(i + 1, t − X[i]) are already known, we can compute S(i, t) in constant time, so
memoizing this recurrence gives us and algorithm that runs in O(nT ) time .6 To turn this into an
explicit dynamic programming algorithm, we only need to consider the subproblems S(i, t) in the
proper order:
S UBSET S UM(X[1 .. n], T ):
S[n + 1, 0] ← T RUE
for t ← 1 to T
S[n + 1, t] ← FALSE
for i ← n downto 1
S[i, 0] = T RUE
for t ← 1 to X[i] − 1
S[i, t] ← S[i + 1, t] hhAvoid the case t < 0ii
for t ← X[i] to T
S[i, t] ← S[i + 1, t] ∨ S[i + 1, t − X[i]]
return S[1, T ]
This direct algorithm clearly always uses O(nT ) time and space . In particular, if T is significantly
larger than 2n , this algorithm is actually slower than our naı̈ve recursive algorithm. Dynamic
programming isn’t always an improvement!
6
This does not contradict our earlier upper bound of O(2n ). Both upper bounds are correct. Which bound is actually
better depends on the size of T .
14
Algorithms Lecture 4: Dynamic Programming
The recurrence suggests that our algorithm should use O(n2 ) time and space, since the input param-
eters i and j each can take n different values. To get an explicit dynamic programming algorithm,
we only need to ensure that both L(i, j + 1) and L(j, j + 1) are considered before L(i, j), for all i
and j.
LIS(A[1 .. n]):
A[0] ← −∞ hhAdd a sentinelii
for i ← 0 to n hhBase casesii
L[i, n + 1] ← 0
for j ← n downto 1
for i ← 0 to j − 1
if A[i] ≥ A[j]
L[i, j] ← L[i, j + 1]
else
L[i, j] ← max{L[i, j + 1], 1 + L[j, j + 1]}
return L[0, 1] − 1 hhDon’t count the sentinelii
As predicted, this algorithm clearly uses O(n2 ) time and space . We can reduce the space to O(n)
by only maintaining the two most recent columns of the table, L[·, j] and L[·, j + 1].
This is not the only recursive strategy we could use for computing longest increasing subse-
quences. Here is another recurrence that gives us the O(n) space bound for free. Let L0 (i) denote
the length of the longest increasing subsequence of A[i .. n] that starts with A[i]. Our goal is to
compute L0 (0) − 1. To define L0 (i) recursively, we only need to specify the second element in subse-
quence; the Recursion Fairy will do the rest.
Here, I’m assuming that max ∅ = 0, so that the base case is L0 (n) = 1 falls out of the recurrence
automatically. Memoizing this recurrence requires O(n) space, and the resulting algorithm runs in
O(n2 ) time. To transform this into a dynamic programming algorithm, we only need to guarantee
that L0 (j) is computed before L0 (i) whenever i < j.
LIS2(A[1 .. n]):
A[0] = −∞ hhAdd a sentinelii
for i ← n downto 0
L0 [i] ← 1
for j ← i + 1 to n
if A[j] > A[i] and 1 + L0 [j] > L0 [i]
L0 [i] ← 1 + L0 [j]
return L0 [0] − 1 hhDon’t count the sentinelii
15