M2-longest_common_subsequence
M2-longest_common_subsequence
Inspiration
• Biological applications often need to compare the DNA
of two (or more) different organisms
• A strand of DNA consists of a string of molecules called
bases, where the possible bases are adenine, guanine,
cytosine, and thymine
• each of these bases by its initial letter, we can express a
strand of DNA as a string over the finite set {A, C, G, T}
Inspiration
• For example, the DNA of one organism may be S1=
ACCGGTCGAGTGCGCGGAAGCCGGCCGAA, and
the DNA of another organism may be S2=
GTCGTTCGGAATGCCGTTGCTCTGTAAA.
• One reason to compare two strands of DNA is to
determine how “similar the two strands are, as some
measure of how closely related the two organisms are
Inspiration
• We can define similarity in many different ways
• First way - we can say that two DNA strands are similar
if one is a substring of the other
• Longer the strand S3 we can find, the more similar S1 and S2 are
Inspiration
• S1= ACCGGTCGAGTGCGCGGAAGCCGGCCGAA
• S2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA
• S3 is GTCGTCGGAAGCCGGCCGAA
Problem Statement
• A subsequence of a given sequence is just the given
sequence with zero or more elements left out
X and Y
Proof of Theorem 15.1
• We wish to show that it is an LCS
than k-1
an LCS of X and Y
Xi and Yj
• Given the value of c[i, j], we can determine in O(1) time which
of these three values was used to compute c[i,j], without
inspecting table b.
Improving the code
• Thus, we can reconstruct an LCS in O(m+n) time using a
procedure similar to PRINT -LCS.