0% found this document useful (0 votes)

7 views11 pages

Csci3104 S2018 L7

Uploaded by

Aissa Hadjoudja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Csci3104 S2018 L7

Uploaded by

Aissa Hadjoudja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CSCI 3104, CU-Boulder Profs.

Clauset & Grochow

Lecture 7 Spring 2018

1 Aligning sequences with dynamic programming

Suppose we can represent some user-input as a sequence of symbols x, where xi ∈ Σ with Σ denot-
ing the input alphabet. Our task is to identify a best match of that input to a library of sequences
Y = {y}. However, users are prone to input errors, which means that x may not be an exact match
to any of those in our reference set. Thus, in order to correctly identify the user’s input, we must
first identify the y ∈ Y is the best match for the input x.

Examples of this type of problem are more common than we might imagine. For instance, search
engines often try to correct misspelled words in the query string; in speech recognition, the string
is a sequence of values representing the recorded sound wave, which must be matched to a known
word; and in forensic analysis, the input string is a DNA sequence, which may differ from those in
our reference set by some number of nucleic acid mutations, insertions or deletions.

In each case, we aim to align a pair of sequences so that we find the elements in each that cor-
respond exactly to each other, while ignoring the elements between these aligned parts. Here, we
will focus on what is called a global alignment in which we aim to align the entire two sequences.
In order to define an algorithm for finding such an alignment, we must also define a set of edit
operations E ∈ E and a cost for each c(E) ≥ 0.

The problem of sequence alignment is to find a minimal-cost set of edit operations that transforms
the sequence x into the sequence y. We will solve this problem using a dynamic programming
algorithm.

1.1 Edit operations and costs

Before we can write down an algorithm, we must define the set of edit operations E we may use.
Here, we will utilize three operations beyond the default “no-op” operation, which leaves a letter
unchanged.

• Substitution (sub): replace a letter xi with some other letter in the alphabet Σ, at the same
position as xi .
For instance, “so” and “do” are two strings that differ by a single substitution edit, and which
are commonly misspelled for each other on a keyboard because s and d are next to each other.

• Insertion and Deletion (indel ): insert some letter from the alphabet Σ into x, shifting all
subsequent letters one position later in the string; or, delete xi from x, shifting all subsequent
letters one position earlier in the string. Note that an insertion operation into one string is
equivalent to a deletion operation in the other string.
For instance, “grande” and “grand” are two strings that differ by a single indel operation.

1
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

• Transposition (swap): take two consecutive letters xi , xi+1 and exchange their positions, and
then substitute them into the aligned positions in y.1
For instance, both “their” / “thier” and “teh” / “the” are pairs of strings that differ by a
single transposition.

Given these operations, we must now also choose a cost function c(E). There are several choices
for this function, but here we choose the “edit distance” function (technically called the Damerau-
Levenshtein Distance)2 which simply counts the number of these operations required to transform
x into y.3 The one wrinkle is that transposition is actually three operations: one swap, followed by
two subs, for a total cost of 3, while any single sub or indel costs 1.

1.2 An example
To illustrate how to compute the cost of a particular alignment, consider aligning the two strings
x = THEIR and y = THERE.

Alignment 1 : Substitute the last two characters, for a total cost of 2 sub operations:

THEIR
|||ss
THERE

Alignment 2 : Insert and delete so that the R lines up, for a total cost of 2 indel operations:

THEIR-
|||d|i
THE-RE

where “-” denotes a “gap” character, implying an insertion on the opposing string.

Alignment 3 : At worst, delete the entire first string, and insert the entire second string, for a total
cost of 10 indel operations:
1
Generalizations exist that allow letters to be transposed more than one, or to allow longer substrings to be
transposed, but these algorithms are more complicated.
2
Supposedly, these types of “edits” represent a large fraction, possibly 80% or more, of all human misspellings,
with the remaining presumably being confusion over which word to use in the first place, e.g., “their” versus “they’re”.
3
Other cost structures are certainly possible, depending on the application. For instance, a transposition might be
less costly than an insertion, etc. Furthermore, cost may depend on the letters being changed, perhaps reflecting the
probability of the error. For instance, adjacent letters on a QWERTY keyboard may have lower costs for substitution
or transposition than letters far apart.

2
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

THEIR-----
dddddiiiii
-----THERE

Clearly, the first two alignments are cheaper than the third alignment, and under the edit-distance
cost function, either of those would be an acceptable alignment.

1.3 When can we apply dynamic programming?

Recall that in dynamic programming, we will assemble the solution to a larger problem by utiliz-
ing the exact solutions to a smaller problem contained within our larger problem. In general, the
relationship a problem and its subproblems defines a recursive structure that we can use to build
the full solution in a “bottom-up” fashion.4

A general requirement for dynamic programming is that there cannot be a cycle among subproblem
dependencies, such that solving some problem A requires eventually solving some B that requires
solving A. Thus, dynamic programming can be applied only if the space of subproblems can be
organized into a directed acyclic graph (a “DAG”), in which each subproblem is a vertex and an
arc i → j represents that solving j requires solving i first.

1.4 Dynamic programming solution

The ordered substructure in sequence alignment comes from the additive cost of making addi-
tional edit operations, as we move from left-to-right through the sequences. That is, the cost of
aligning two subsequences x1 x2 . . . xi = x1...i and y1 y2 . . . yj = y1...j is the cost of the edit oper-
ation for xi and yj plus the cost of aligning the subproblem that got us to needing to align xi and yj .

There are only three ways we could have gotten to needing to align xi and yj :

• the last op was sub, and we paid the cost of aligning x1...i−1 and y1...j−1 ,

• the last op was indel, and we paid the cost of aligning either x1...i and y1...j−1 or aligning
x1...i−1 and y1...j , or

• the last op was swap, and we paid the cost of aligning x1...i−2 and y1...j−2 .

Let cost(i, j) be the minimum cost of aligning x1...i and y1...j , where we define as a base case
cost(0, 0) = 0.
4
There are additional requirements for dynamic programming to produce a polynomial-time algorithm: the number
of subproblems must be polynomial in size and the recursive function must run in polynomial time.

3
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

Thus, recursive structure of the subproblems we identified above implies that cost(i, j) may be
computed recursively as


 cost(i − 2, j − 2) + c(swap)
cost(i − 1, j − 1) + c(sub)

cost(i, j) = min

 cost(i − 1, j) + c(indel)
cost(i, j − 1) + c(indel)


where we define c(sub) = 0 if xi = yj , i.e., a “no-op.” This function is equivalent to this DAG
template:

i 2, j 2
sw
a p

i 1, j 1 i 1, j
sub
indel

indel
i, j 1 i, j

which represents the relationship between subproblems.

By memoizing the solutions (costs) to the subproblems for 0 ≤ i ≤ nx (length of x) and 0 ≤ j ≤ ny

(length of y), storing them in a 2-dimensional array S[i, j] = cost(i, j), we can recursively compute
the minimum cost of aligning x and y.

1.5 A small and fully worked example

Before tackling a large example, let us exhaustively do a small one. Consider aligning x = STEP
and y = APE.

We begin by writing out the cost matrix5 S, and filling in the base case for aligning two empty
strings, which has cost(0, 0) = 0.

We may now immediately fill in the values for the 0th column and 0th row, which correspond to
the cost of aligning an empty string with x (column 0) or with y (row 0). In each of these cases,
5
For convenience, we will assume this matrix is 0-indexed, meaning that the first element in a row or a column is
the 0th element.

4
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

the alignment consists of inserting each character in the target string into the empty string, and
thus the costs in the 0th row are S(0, j) = j for 1 ≤ j ≤ ny , and the costs in the 0th column are
S(i, 0) = i for 1 ≤ i ≤ nx .

x/y A P E x/y A P E x/y A P E

0 0 1 2 3 0 1 2 3
S S 1 S 1 1 2 3
T T 2 T 2 2
E E 3 E 3 3
P P 4 P 4 4
base case empty strings aligned first character aligned

At the next step, we set i = 1 and j = 1 and align x0 = S with y 0 = A. There are three subproblems
to consider (the fourth subproblem, corresponding to swap, isn’t allowed yet):

• (Sub) We previously aligned from x with from y, for cost S(0, 0) = 0.

Now we substitute S for A , which costs c(sub) = 1.
Cost = 1.

• (Delete) We previously aligned from x with A from y, for cost S(0, 1) = 1.

Now we delete S , which costs c(indel) = 1.
Cost = 2.

• (Insert) We previously aligned S from x with from y, for cost S(1, 0) = 1.

Now we insert A , which costs c(indel) = 1.
Cost = 2.

The minimum of these choices is uniquely the first one, then thus we record S(1, 1) = 1.

Next we consider i = 1 and j = {2, 3}, in which we align S with {AP, APE}. Although we could
write down the three subproblems for each of these, we may also simply recognize that S appears in
neither of these strings, and thus the minimum cost for each alignment will be the cost of deleting
S and inserting y1...j for j = 2, 3. Thus, we may record S(1, j) = j for j = {2, 3}.

The same fact is true for i = {2, 3, 4} and j = 1, in which we align {ST, STE, STEP} with A. Thus,
we may record S(i, 1) = i for i = {2, 3, 4}. What now remains is to align the remaining cases of
substrings. We will treat each of the 6 cases, one at a time.

5
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

Set i, j = 2 and align ST with AP. There are four subproblems to consider:

• (Sub) Previously S → A . Now substitute T for P Cost = S(1, 1) + 1 = 2

• (Delete) Previously S → AP . Now delete T . Cost = S(1, 2) + 1 = 3

• (Insert) Previously ST → A . Now insert P . Cost = S(2, 1) + 1 = 3

• (Swap) Previously → . Now transpose ST and sub for AP Cost = S(0, 0) + 3 = 3

Thus, we record S(2, 2) = 2.

Now setting i = 2 and j = 3, we align ST with APE:

• (Sub) Previously S → AP . Now substitute T for E . Cost = S(1, 2) + 1 = 3

• (Delete) Previously S → APE . Now delete T . Cost = S(1, 3) + 1 = 4

• (Insert) Previously ST → AP . Now insert E . Cost = S(2, 2) + 1 = 3

• (Swap) Previously → A . Now transpose ST and sub for PE . Cost = S(0, 1) + 3 = 4

Thus, we record S(2, 3) = 3, which represents the cost of either of these subalignments:

S-T ST-
sis ssi
APE APE
Now we set i = 3 and j = 2, in which we align STE with AP. Again, there are four subproblems to
consider:

• (Sub) Previously ST → A . Now substitute E for P . Cost = S(2, 1) + 1 = 3

• (Delete) Previously ST → AP . Now delete E . Cost = S(2, 2) + 1 = 3

• (Insert) Previously STE → A . Now insert P . Cost = S(3, 1) + 1 = 4

• (Swap) Previously S → . Now transpose TE and sub for AP . Cost = S(1, 0) + 3 = 4

Thus, we record S(3, 2) = 3.

6
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

Now setting j = 3 and aligning STE with APE, we have:

• (Sub) Previously ST → AP . Now substitute E for E . Cost = S(2, 2) + 0 = 2

• (Delete) Previously ST → APE . Now delete E (from x). Cost = S(2, 3) + 1 = 4

• (Insert) Previously STE → AP . Now insert E (into y). Cost = S(3, 2) + 1 = 4

• (Swap) Previously S → A . Now transpose TE and sub for PE . Cost = S(1, 1) + 3 = 4

Thus, we record S(3, 3) = 2.

Penultimately, we consider i = 4 and j = 2 and align STEP with AP:

• (Sub) Previously STE → A . Now substitute P for P . Cost = S(3, 1) + 0 = 3

• (Delete) Previously STE → AP . Now delete P (from x). Cost = S(3, 2) + 1 = 4

• (Insert) Previously STEP → A . Now insert P (into y). Cost = S(4, 1) + 1 = 5

• (Swap) Previously ST → . Now transpose EP and sub for AP . Cost = S(2, 0) + 3 = 5

Thus, we record S(4, 2) = 3.

And finally, we set i = 4 and j = 3 and align STEP with APE:

• (Sub) Previously STE → AP . Now substitute P for E . Cost = S(3, 2) + 1 = 4

• (Delete) Previously STE → APE . Now delete P (from x). Cost = S(3, 3) + 1 = 3

• (Insert) Previously STEP → AP . Now insert E (into y). Cost = S(4, 2) + 1 = 4

• (Swap) Previously ST → A . Now transpose EP and sub for PE . Cost = S(2, 1) + 1 = 3

Thus, we record S(4, 3) = 3, which gives the final minimum cost for aligning STEP with APE, via
any of these alignments:
STEP STEP STEP
ss|d dstt sdtt
APE- -APE A-PE

7
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

Here are the completed cost matrices:

x/y A P E x/y A P E x/y A P E

0 1 2 3 0 1 2 3 0 1 2 3
S 1 1 2 3 S 1 1 2 3 S 1 1 2 3
T 2 2 2 3 T 2 2 2 3 T 2 2 2 3
E 3 3 E 3 3 3 2 E 3 3 3 2
P 4 4 P 4 4 P 4 4 3 3
align ST with y align STE with y align STEP with y

To extract the 3 minimum-cost alignments given above, we examine the sequences of choices we
made to arrive at S(4, 3) = 3. Specifically, there are three paths from S(0, 0) that all reach S(4, 3),
and each of these paths corresponds to a minimum-cost alignment. Left- or down- moves represent
indel operations, single-diagonal moves are a sub, and double-diagonal moves are a swap.

x/y A P E x/y A P E x/y A P E

0 1 2 3 0 1 2 3 0 1 2 3
S 1 1 2 3 S 1 1 2 3 S 1 1 2 3
T 2 2 2 3 T 2 2 2 3 T 2 2 2 3
E 3 3 3 2 E 3 3 3 2 E 3 3 3 2
P 4 4 3 3 P 4 4 3 3 P 4 4 3 3

1.6 A large worked example

Consider aligning the strings x = EXPONENTIAL and y = POLYNOMIAL.6 The full matrix S of costs
is shown below, which is produced by starting at i, j = 0 and applying cost(i, j) as given above
iteratively to each element. (Or, by starting at i = nx and j = ny and making the recursive calls.)
Let us focus on a small piece of the overall calculation: aligning EXP and POLY. The cost is given by

cost(3, 4) = min{cost(1, 2) + 3, cost(2, 3) + 1, cost(2, 4) + 1, cost(3, 3) + 1}

= min{5, 4, 5, 4}
=4

The overall minimum cost of 6 is in the bottom-right corner of S. Note, however, that our cost
matrix does not contain corresponding alignment. Given the completed matrix, we may extract
6
This example is taken from Dasgupta, Papadimitriou and Vazirani’s excellent book Algorithms (2006).

8
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
CSCI 5454,
Lecture 7 CU Boulder Christopher Aicher & Ryan Hand
Spring 2018
Sequence Alignment Lecture April 2, 2013

Figure 4: Subproblem Alignment

Figure 3: Subproblem Alignment Costs of Exponential vs. Poly- Path of Exponential vs. Polynomial.
nomial. Figure from [3]. Figure from [3].
the corresponding alignment by starting in the bottom-right corner and finding the minimum cost
pathThe
backwards
matrix ofthrough
subproblemthesolutions
DAG tois S(0,
shown0).in The right-hand
Figure figurethe
3. For example, above shows
minimum this path,
alignment costwhose
of
and is calculated
corresponding alignment is
EXP POLY as

OP T (3, 4) = min{OP T (1, 2)+3, OP T (2, 3)+1, --POLYNOMIAL

OP T (2, 4)+1, OP T (3, 3)+1} = min{2+3, 3+1, 4+1, 3+1} = 4 .
ii||ss|ds|||
Now the matrix only gives the minimum alignment cost. To find an optimal alignment, we keep track of
EXPONEN-TIAL
the previous optimal subalignments that are used: the path through the DAG. An example path is shown
in Figure 4.
for a cost of 3 indel s and 3 subs, or 6 overall.
3.2 Proof of Correctness
1.7 Correctness
Because our solutions are built recursively, the standard technique for proving the correctness of a dynamic
Weprogramming
now proveisthat
induction. Therefore, iswecorrect,
this algorithm will prove
i.e.,thefinds
correctness of any sequence
a minimum-cost alignment
alignment. Asalgorithm
usual with
using the recursion relation (*) with induction.
recursive functions, we provide a proof-by-induction, on the cost of aligning the leading substrings
ofClaim.
x and Any
y. alignment of strings s1 , s2 satisfying the recursion relation (*) is a minimial cost alignment.
Proof. Base Case: The cost of aligning nothing is zero, OP T (0, 0) = 0. Inductive Step: Assume we’ve
Claim: Any
calculated thealignment of strings
minimial alignment forxOP
and
T (k,y l)that
for ksatisfies theThen
< i, l < j. cost(i, j) are
there function, is a minimal
only 4 possible previouscost
alignment.
subalignments based on the last operator(s),
• Transposition + Substitution: We swap s1 [i 1] with s1 [i] and then substitute them for s2 [j] and
Proof :s2First,
[j 1]we dispense These
respectively. with the base
three editscase
costofc(Swap).
aligningThe
tworest
strings
of theof length 0.
alignment The
cost costaligning
is from here must
s1 [1 . . . ithere
be 0 because 2] and
aresno
2 [1 .letters
. . j 2],
towhose
align minimum
and therevalue
canisbeOP
noT (i
edit2,operations.
j 2). Therefore
Thus,the minimum
cost(0, 0) = 0.
cost ending with a swap is OP T (i 2, j 2) + c(Swap).
Now,• assume that We
Substitution: we substitute
have calculated
s1 [i] for as2minimum costc(Sub).
[j]. This costs alignment on x
The rest the and
of1...k y1...` , cost
alignment for k
is < i and
from
` < j. aligning
There ares1 [1only
. . . i four
1] and s2 [1 . . .previous
possible j 1]. Therefore the minimum
subalignments cost ending
to consider, eachwith a substitution
of which is
corresponds
OP T (i 1, j 1) +
to the last edit operation used: c(Sub).

•• Transpose:
Delete/Insert s1 /s2 : We add a gap a gap character after s2 [j] to match s1 [i]. This costs c(InDel).
First, we swap xi−1 and xi , and we then substitute them for yj−1 and yj respec-
The rest of the alignment cost is from aligning s1 [1 . . . i 1] and s2 [1 . . . j]. Therefore the minimum
tively. These three
cost ending with edits together
a substitution is OP Tcost
(i c(swap),
1, j) + c(Sub)by definition.

• Insert/Delete s1 /s2 : Similarly, we add a gap a gap character after s1 [i] to match s1 [j]. This costs
c(InDel). The rest of the alignment cost is from9 aligning s1 [1 . . . i] and s2 [1 . . . j 1]. Therefore the
minimum cost ending with a substitution is OP T (i, j 1) + c(Sub)
Taking the mininum over the possible operations gives the recursive relation (*). Since these are the only
possible paths to aligning substrings (i, j), the recursion gives the minimal cost for aligning (i, j).

5
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

The remaining cost is from aligning x1...i−2 with y1...j−2 , whose minimum cost is cost(i−2, j −
2). Therefore, the minimum cost ending with a swap is cost(i − 2, j − 2) + c(swap).

• Substitute: We substitute the value at xi for the value at yj . This costs c(sub) by definition.
The remaining cost is from aligning x1...i−1 with y1...j−1 , whose minimum cost is cost(i−1, j −
1). Therefore, the minimum cost ending with a sub is cost(i − 1, j − 1) + c(sub).

• Delete in x and Insert in y: We add a gap character after yj to match xi . This costs c(indel)
by definition.
The remaining cost is from aligning x1...i−1 with y1...j , whose minimum cost is cost(i − 1, j).
Therefore, the minimum cost ending with a sub is cost(i − 1, j) + c(indel).

• Insert in x and Delete in y: We add a gap character after xi to match yj . This costs c(indel)
by definition.
The remaining cost is from aligning x1...i with y1...j−1 , whose minimum cost is cost(i, j − 1).
Therefore, the minimum cost ending with a sub is cost(i, j − 1) + c(indel).

Because cost(i, j) is defined as the minimum cost over the four possibilities, and because these are
the only paths to aligning substrings i, j, the recursion relation must give the minimal cost for
aligning i, j.

1.8 Pseudocode and running time

Although a recursive algorithm that carries out the work of filling in the matrix S is easy to define,
an iterative algorithm is almost as easy to write down. Much like the iterative algorithm for the
0-1 Knapsack problem, the iterative sequence alignment algorithm begins at the base case and fills
in the elements in each column, and then repeats this for each row. Furthermore, without using
asymptotically more space than S, we may also construct the alignment itself in parallel with filling
in S. The algorithm below is a simple generalization of the one originally given by Needleman and
Wunsch in 1970.

input: x with length nx and y with length ny

initialize S, of dimensions nx+1 by ny+1
initialize p, of dimensions nx+1 by ny+1

S[0,0] = 0
p[0,0] = NULL
for i = 0 to nx // consider all letters of x
for j = 0 to ny // consider all letters of y
if i>0 or j>0 // skip the base case
S[i,j] = cost(i,j) // minimum cost up to xi and yj

10
CSCI 3104, CU-Boulder Profs. Clauset & Grochow
Lecture 7 Spring 2018

p[i,j] = argmin of cost(i,j) // record the branch did we took

end
end
end
return S[nx,ny] and path starting from p[nx,ny]

where we have used the definition of cost(i, j) given above.

Assuming that each call to cost(i, j) takes constant time, and we carry out (nx + 1) × (ny + 1) − 1 =
O(nx ny ) = O(n2 ) of them, then the running time is O(n2 ). (Note that x and y are treated sym-
metrically, and we may simply adopt the convention of naming the longer length to be n.) The
space requirement is given by the size of S and p, which are also O(n2 ).

There are more space-efficient versions of this algorithm. For instance, notice that cost(i, j) only
ever refers to elements at most two rows up or two columns left of the current problem parameters.
Thus, we may calculate the final solution by only storing three rows of S. (Do you see why we need
three entire rows, rather than a 3 × 3 submatrix with S(i, j) as the bottom-right element?) Now,
the space requirement is only O(n), but we must also give up the matrix p which means we lose the
record of the optimal alignment. In 1975, Hirschberg gave a clever divide-and-conquer algorithm
that solves both problems.

2 On your own
1. Read Chapter 15

100 Shell Programs Part II
92% (13)
100 Shell Programs Part II
55 pages
Python Question With Answers
100% (4)
Python Question With Answers
7 pages
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
No ratings yet
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
41 pages
Unit 2 Daa PDF
No ratings yet
Unit 2 Daa PDF
99 pages
Addis Ababa Science and Technology University (Aastu) Industrial Automation and Introduction To Robotics
No ratings yet
Addis Ababa Science and Technology University (Aastu) Industrial Automation and Introduction To Robotics
47 pages
Structural Analysis Report FBD Vs PBD 1628845299
No ratings yet
Structural Analysis Report FBD Vs PBD 1628845299
103 pages
CHM1 11 - 12 Q1 0301 PF FD
No ratings yet
CHM1 11 - 12 Q1 0301 PF FD
33 pages
04-08-2024 - JR - IIT - STAR CO-SC (MODEL-B) - Jee-Main - CTM - 6 - QP
No ratings yet
04-08-2024 - JR - IIT - STAR CO-SC (MODEL-B) - Jee-Main - CTM - 6 - QP
16 pages
Python Mod1 PPT
No ratings yet
Python Mod1 PPT
141 pages
Dynamic Programming 4
No ratings yet
Dynamic Programming 4
107 pages
ALGORITHMS LAB MANUAL - Updated
No ratings yet
ALGORITHMS LAB MANUAL - Updated
47 pages
DAA (Lecture 5)
No ratings yet
DAA (Lecture 5)
52 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
DAA (Lecture 5)
No ratings yet
DAA (Lecture 5)
52 pages
Lecture # 15 - New
No ratings yet
Lecture # 15 - New
70 pages
05 Dynamic Programming I I
No ratings yet
05 Dynamic Programming I I
64 pages
Lecture1 2
No ratings yet
Lecture1 2
44 pages
U2 - Decrease and Conqure
No ratings yet
U2 - Decrease and Conqure
38 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Edit Distance
No ratings yet
Edit Distance
59 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
String Matching
No ratings yet
String Matching
66 pages
03 Med
No ratings yet
03 Med
52 pages
CPT212 04b ComputationalComplexity
No ratings yet
CPT212 04b ComputationalComplexity
38 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
(MOE+12 07) +Results+of+2024+CSAT+Announced
No ratings yet
(MOE+12 07) +Results+of+2024+CSAT+Announced
2 pages
Igcse - Pearson Edexcel Physics Booklet
No ratings yet
Igcse - Pearson Edexcel Physics Booklet
27 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Global Alignment: Ben Langmead
No ratings yet
Global Alignment: Ben Langmead
15 pages
36-7QC Tools-Scatter Diagram
No ratings yet
36-7QC Tools-Scatter Diagram
6 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
R19CS380 Daa.
No ratings yet
R19CS380 Daa.
30 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
Article Careers360 20241025131840
No ratings yet
Article Careers360 20241025131840
13 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Sequence Alignment
No ratings yet
Sequence Alignment
17 pages
A Guided Tour To Approximate String Matching: Gonzalo Navarro
No ratings yet
A Guided Tour To Approximate String Matching: Gonzalo Navarro
58 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
B505 Lec.10 DynamicProgramming 1
No ratings yet
B505 Lec.10 DynamicProgramming 1
19 pages
Fault Current Contribution From Synchronous Machine and Inverter Based DG
No ratings yet
Fault Current Contribution From Synchronous Machine and Inverter Based DG
8 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
SPJET SAMPLE PAPER No 5
No ratings yet
SPJET SAMPLE PAPER No 5
14 pages
8 LCS 19 01 2024
No ratings yet
8 LCS 19 01 2024
17 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Lec09 dp1
No ratings yet
Lec09 dp1
9 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
06DynamicProgrammingII 2x2
No ratings yet
06DynamicProgrammingII 2x2
17 pages
Tableau - Finance Presentation PDF
No ratings yet
Tableau - Finance Presentation PDF
55 pages
Heuristics Search Project - 01
No ratings yet
Heuristics Search Project - 01
15 pages
Semi Detailed Lesson Plan Sample Format
No ratings yet
Semi Detailed Lesson Plan Sample Format
6 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
Green Product Purchase Intention: Impact of Green Brands, Attitude, and Knowledge
No ratings yet
Green Product Purchase Intention: Impact of Green Brands, Attitude, and Knowledge
18 pages
Longest Common Subsequence: Given 2 Sequences, X And, Find A Common Subsequence Whose Length Is Maximum
No ratings yet
Longest Common Subsequence: Given 2 Sequences, X And, Find A Common Subsequence Whose Length Is Maximum
32 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
XI DAV Half Yearly
No ratings yet
XI DAV Half Yearly
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Introduction To Algorithm Design and Analysis: A, A,, A A ', A ',, A ' Such That A ' A ' A
No ratings yet
Introduction To Algorithm Design and Analysis: A, A,, A A ', A ',, A ' Such That A ' A ' A
21 pages
Design and Analysis
No ratings yet
Design and Analysis
5 pages
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
No ratings yet
Advanced Dynamic Programming: D.1 Saving Space: Divide and Conquer
18 pages
Problem 1 - 025317
No ratings yet
Problem 1 - 025317
3 pages
DP Rod Cutting Problem
No ratings yet
DP Rod Cutting Problem
13 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
Roever College of Engineering & Technology: Individual Class Timetable
No ratings yet
Roever College of Engineering & Technology: Individual Class Timetable
4 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
HW 9 Solution
No ratings yet
HW 9 Solution
8 pages
Recursion, Dynamic Programming, Divide & Conquer: Sequence Alignment, Quicksort
No ratings yet
Recursion, Dynamic Programming, Divide & Conquer: Sequence Alignment, Quicksort
9 pages
PROBLEM 9.81: Solution
No ratings yet
PROBLEM 9.81: Solution
13 pages
Reciprocity: Antenna Radio Waves
No ratings yet
Reciprocity: Antenna Radio Waves
8 pages
Hw3 2 Solutions Jehle - Reny
0% (1)
Hw3 2 Solutions Jehle - Reny
7 pages
hw09 Solution PDF
No ratings yet
hw09 Solution PDF
8 pages
10 Mathematics Ncert Ch03 Pair of Linear Equations in Two Variables Ex 3.7 Ans Ssa
No ratings yet
10 Mathematics Ncert Ch03 Pair of Linear Equations in Two Variables Ex 3.7 Ans Ssa
5 pages
Availability Analysis of Gas Turbines
No ratings yet
Availability Analysis of Gas Turbines
10 pages
Curriculum Vitae - Yang Zu
No ratings yet
Curriculum Vitae - Yang Zu
3 pages
Number Syste
No ratings yet
Number Syste
4 pages
Uninformed Search 5 Edit Distance: Naive Method
No ratings yet
Uninformed Search 5 Edit Distance: Naive Method
2 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Academic CV of Farhad
No ratings yet
Academic CV of Farhad
2 pages
Lcs
No ratings yet
Lcs
3 pages
University of Sheffield - Structural Engineering Masters - Vibration Engineering Coursework 2 Solutions Alex Pavic Multiple Degree of Freedom Systems
No ratings yet
University of Sheffield - Structural Engineering Masters - Vibration Engineering Coursework 2 Solutions Alex Pavic Multiple Degree of Freedom Systems
3 pages
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)

Csci3104 S2018 L7

Uploaded by

Csci3104 S2018 L7

Uploaded by

CSCI 3104, CU-Boulder Profs.

Clauset & Grochow

1 Aligning sequences with dynamic programming

1.1 Edit operations and costs

1.3 When can we apply dynamic programming?

1.4 Dynamic programming solution

which represents the relationship between subproblems.

By memoizing the solutions (costs) to the subproblems for 0 ≤ i ≤ nx (length of x) and 0 ≤ j ≤ ny

1.5 A small and fully worked example

x/y A P E x/y A P E x/y A P E

• (Sub) We previously aligned from x with from y, for cost S(0, 0) = 0.

• (Delete) We previously aligned from x with A from y, for cost S(0, 1) = 1.

• (Insert) We previously aligned S from x with from y, for cost S(1, 0) = 1.

• (Sub) Previously S → A . Now substitute T for P Cost = S(1, 1) + 1 = 2

• (Delete) Previously S → AP . Now delete T . Cost = S(1, 2) + 1 = 3

• (Insert) Previously ST → A . Now insert P . Cost = S(2, 1) + 1 = 3

• (Swap) Previously → . Now transpose ST and sub for AP Cost = S(0, 0) + 3 = 3

Thus, we record S(2, 2) = 2.

Now setting i = 2 and j = 3, we align ST with APE:

• (Sub) Previously S → AP . Now substitute T for E . Cost = S(1, 2) + 1 = 3

• (Delete) Previously S → APE . Now delete T . Cost = S(1, 3) + 1 = 4

• (Insert) Previously ST → AP . Now insert E . Cost = S(2, 2) + 1 = 3

• (Swap) Previously → A . Now transpose ST and sub for PE . Cost = S(0, 1) + 3 = 4

• (Sub) Previously ST → A . Now substitute E for P . Cost = S(2, 1) + 1 = 3

• (Delete) Previously ST → AP . Now delete E . Cost = S(2, 2) + 1 = 3

• (Insert) Previously STE → A . Now insert P . Cost = S(3, 1) + 1 = 4

• (Swap) Previously S → . Now transpose TE and sub for AP . Cost = S(1, 0) + 3 = 4

Thus, we record S(3, 2) = 3.

Now setting j = 3 and aligning STE with APE, we have:

• (Sub) Previously ST → AP . Now substitute E for E . Cost = S(2, 2) + 0 = 2

• (Delete) Previously ST → APE . Now delete E (from x). Cost = S(2, 3) + 1 = 4

• (Insert) Previously STE → AP . Now insert E (into y). Cost = S(3, 2) + 1 = 4

• (Swap) Previously S → A . Now transpose TE and sub for PE . Cost = S(1, 1) + 3 = 4

Thus, we record S(3, 3) = 2.

Penultimately, we consider i = 4 and j = 2 and align STEP with AP:

• (Sub) Previously STE → A . Now substitute P for P . Cost = S(3, 1) + 0 = 3

• (Delete) Previously STE → AP . Now delete P (from x). Cost = S(3, 2) + 1 = 4

• (Insert) Previously STEP → A . Now insert P (into y). Cost = S(4, 1) + 1 = 5

• (Swap) Previously ST → . Now transpose EP and sub for AP . Cost = S(2, 0) + 3 = 5

Thus, we record S(4, 2) = 3.

And finally, we set i = 4 and j = 3 and align STEP with APE:

• (Sub) Previously STE → AP . Now substitute P for E . Cost = S(3, 2) + 1 = 4

• (Insert) Previously STEP → AP . Now insert E (into y). Cost = S(4, 2) + 1 = 4

• (Swap) Previously ST → A . Now transpose EP and sub for PE . Cost = S(2, 1) + 1 = 3

Here are the completed cost matrices:

x/y A P E x/y A P E x/y A P E

x/y A P E x/y A P E x/y A P E

1.6 A large worked example

cost(3, 4) = min{cost(1, 2) + 3, cost(2, 3) + 1, cost(2, 4) + 1, cost(3, 3) + 1}

Figure 4: Subproblem Alignment

OP T (3, 4) = min{OP T (1, 2)+3, OP T (2, 3)+1, --POLYNOMIAL

1.8 Pseudocode and running time

input: x with length nx and y with length ny

p[i,j] = argmin of cost(i,j) // record the branch did we took

where we have used the definition of cost(i, j) given above.

You might also like