Affine Gap
Affine Gap
BCB 597
Fall, 2007
Multiple Base Insertion/Deletion
Mutation Event
Constant Gap Penalty
• Each gap character is penalized a constant
amount: γ
• Not a good gap penalty for our mutation event
model
• What if we had a more general gap penalty
function?
General Gap Penalty Function
• Gap penalty function: W (g )
• Each entry in the dynamic programming table
now takes O(n) time to fill.
S[0,0] 0 • O(n3) running time
S[i,0] W (i ) • Cannot directly apply space saving
S[0, j ] W ( j ) • O(nm) space
S [i 1, j 1] (i 1, j 1)
i
S[i, j ] max max S [i k , j ] W (k )
k 1
j
max S [i, j k ] W (k )
k 1
Affine Gap Penalty
• W(i)=g+hi (for i >= 1)
• g: gap opening penalty
• h: gap extension penalty
• The ratio between g and h affects how much
we view the size of the gap.
– Small g, Large h: size more important
– Large g, Small h: size less important
Remember our “What if?”
• E(AA, BA) = E(AA[0,i], BA [0,i]) + E(AA[i+1,L], BA[i+1,L])
Alignment Decomposition
No Longer Behaves
optimal
AGCTAGAGACCAG---TCT-GAGGTAGA
AGCTAGAGACCAGCTATCTAGAGGTAGA
1, 1, g 3, h 1
However, this was a sufficient condition, not a necessary condition!
Amortize the cost of opening
AGCTAAGGAA
• Consider filling out the dynamic AGCTAAGGAC
programming table >
• The cost of g is eventually spread AGCTAAGGA-
out over the length of the gap AGCTAAGGAC
once the gap is closed.
• While gap opening might initially AGCTAAGGAAA
cause the alignment score to be AGCTAAGGACT
less than a mismatch. >
• However, the one time loss might AGCTAAGGA--
eventually become “worth it.” AGCTAAGGACT
• We cannot predict ahead to
know how much the gap opening AGCTAAGGAAAA
will cost per gap character, or we AGCTAAGGACTT
would directly know if the gap <
opening will be worth it. AGCTAAGGA---
AGCTAAGGACTT
Three Dynamic Programming Tables
• Instead of looking to the future, we simple store
the best possible alignments that end in gaps, in
case they might eventually become “worth it.”
• Table GA: Stores the best alignment, under the
restriction that the last character in AA be aligned
with a gap.
• Table GB: Stores the best alignment score, under
the restriction that the last character in BA be
aligned with a gap.
• Table S: Stores the best alignment score with no
restrictions.
Three Recursions
G A [i 1, j ] h
G A [i, j ] max
G A [0, j ] S [i 1, j ] g h
GB [i,0] GB [i, j 1] h
S [0, j ] g jh
GB [i, j ] max
S [i, j 1] g h
S [i,0] g ih
S [i 1, j 1] (i 1, j 1)
S [0,0] 0
S[i, j ] max G A [i, j ]
GB [i, j ]
Traceback
• The path can now travel
• Start in table S
• If S[i,j] = GA[i,j] or GB[i,j], jump to the
corresponding table, without making a
“move.”
• Paths through GA must move up
• Paths through GB must move left
Semiglobal Alignment
G A [i 1, j ] h
G A [i, j ] max
G A [0, j ] S [i 1, j ] g h
GB [i,0] GB [i, j 1] h
S [0, j ] 0
GB [i, j ] max
S [i, j 1] g h
S [i,0] 0
S [i 1, j 1] (i 1, j 1)
S [0,0] 0
S[i, j ] max G A [i, j ]
GB [i, j ]
Local Alignment
• The key to Local Alignment is that the aligning
region must still start and end with a matching
character.
• This means that a local alignment path must
start and end in table S.
• Thus finding local alignment is very similar to
local alignment for constant gap penalty.
Local Alignment
G A [i 1, j ] h
G A [i, j ] max
G A [0, j ] S [i 1, j ] g h
GB [i,0] GB [i, j 1] h
GB [i, j ] max
S [0, j ] 0 S [i, j 1] g h
S [i,0] 0
S [0,0] 0 S [i 1, j 1] (i 1, j 1)
S[i, j ] max G A [i, j ]
GB [i, j ]
0
Applying Space Saving To Affine Gap
Alignment
Finding the Intersection
S[i, j ] S[i 1, j ] g h j j
G [i, j ] G [i 1, j ] g h j j
A A
max S[i, j ] G A [i 1, j ] h j j
G [i, j ] S[i 1, j ] h
j
j j
A
S[i, j ] S [i 1, j 1] (i, j ) j j 1
Diagonal Intersection
(i, j )
…
g+2h
g+h
g+3h
g+2h
g+h
…
g+3h
g+2h
g+h
…
g+h
g+2h
…
Vertical Intersection
gh
…
2h
h
g+3h
g+2h
g+h
…
g+3h
g+2h
g+h
…
h
2h
…