Lec10 12 Edit Distance
Lec10 12 Edit Distance
Gwalior
Lecture : 10
Edit distance (w1,w2)
(Cost of all Edit Operations)
===================================================================================
▪ Often, we have seen that the system use to suggest us a set of words which are
similar to the wrongly typed word.
DNA Sequence Alignment!!!
▪ In computation biology, DNA sequence alignment is one of the important
problems:
• X : ACCGGTCGAGTGCGCGGAAGCCGGCCGAA
• Y : GTCGTTCGGAATGCGTTGCTCTGTAA
▪ What is the best possible alignment between the above two DNA sequences
▪ Best : I mean, it could be in terms of minimizing the minimum edit distance
between X and Y.
On what basis, the system suggests us?
• Source word 𝑤1 S E N T A N C E
• Target word 𝑤2 S E N T E N C E
s
• Single edit needed i.e., substitute E for A to obtain target word from source
word.
• Hence, 𝐷 𝑤1 , 𝑤2 = cost(s) = 1
Computing Edit Distance
❑ Example – 02 : SENTANSE to SENTENCE
• Source word 𝑤1 S E N T A N S E
• Target word 𝑤2 S E N T E N C E
s s
• Two edit operations needed i.e., substitute E for A & substitute C for S to obtain
target word from source word.
• Hence, 𝐷 𝑤1 , 𝑤2 = cost(s)+cost(s) = 1+1 = 2
Computing Edit Distance
❑ Example – 03 : SNTANSE to SENTENCE
• Source word 𝑤1 S * N T A N S E
• Target word 𝑤2 S E N T E N C E
i s s
• Source word 𝑤1 S S * N T A N S E
• Target word 𝑤2 * S E N T E N C E
d i s s
• Source word 𝑤1 I N T E N T I O N
• Target word 𝑤2 E X E C U T I O N
s s s s s
• Source word 𝑤1 I N T E * N T I O N
• Target word 𝑤2 * E X E C U T I O N
d s s i s
I N T E N T I O N I N T E N T I O N
Delete “I”
N T E N T I O N
Substitute N by E
Large Problem E T E N T I O N
at Hand Substitute T by X
E X E N T I O N
Insert U
E X E N U T I O N
Substitute N by C
E X E C U T I O N E X E C U T I O N
ABV-Indian Institute of Information Technology and Management
Gwalior
Lecture : 11
Edit distance (w1,w2)
(Cost of all Edit Operations)
===================================================================================
▪ Let us define 𝐷(𝑖, 𝑗) : the edit distance between X[1…i] & Y[1…j]
• X[1…i] : first 𝑖 character of X
• Y[1…j] : first 𝑗 character of Y
Wagner, Robert A., and Michael J. Fischer. "The string-to-string correction problem." Journal of the ACM (JACM)
21.1 (1974): 168-173.
Minimum Edit Distance Algorithm
▪ Dynamic Programming Approach:
▪ To compute 𝐷(𝑛, 𝑚):
• Compute 𝐷(𝑖, 𝑗) for small i and j
• Use previous 𝐷(𝑖, 𝑗) to compute 𝐷(𝑖, 𝑗) for larger i and j
• Do till end up with 𝐷(𝑛, 𝑚)
Minimum Edit Distance Algorithm
▪ Dynamic Programming Formulation:
▪ Initialisation:
• 𝐷 0,0 = 0
• 𝐷 𝑖, 0 = 𝑖 Delete 𝑖 character from X # E X E C U
• 𝐷 0, 𝑗 = 𝑗 Insert 𝑗 character from Y
# U T I O N
# E X E C U
# U T I O N
Minimum Edit Distance Algorithm
▪ Dynamic Programming Formulation:
▪ Initialisation:
• 𝐷 0,0 = 0
• 𝐷 𝑖, 0 = 𝑖
• 𝐷 0, 𝑗 = 𝑗
• Recursion:
• for i = 1 to n
• for j = 1 to m
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ] [Levenshtein (v2) is
used here]
+0 if X[ i ] = Y[ j ]
• Obtain 𝐷 𝑛, 𝑚
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
▪ Initialisation:
• 𝐷 0,0 = 0
• 𝐷 𝑖, 0 = 𝑖
R 5
• 𝐷 0, 𝑗 = 𝑗
E 4
Y 3 𝐷 4,0 = 4
A 2
L 1 𝐷 0,0 = 0
𝑖 # 0 1 2 3 4
# Y E A R
𝐷 0,2 = 2
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 0,1 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 0,2 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 0,3 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 0,4 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 1,1 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 1,2 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 1,3 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 𝐷 1,4 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 2
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 4 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 4 3 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 4 3 4 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
Edit Distance Table
▪ Example – 06 : LAYER to YEAR
𝐷 𝑖 − 1, 𝑗 + 1
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 1
R 5 4 3 4 3 𝐷 𝑖 − 1, 𝑗 − 1 +2 if X[ i ] ≠ Y[ j ]
E 4 3 2 3 4 +0 if X[ i ] = Y[ j ]
Y 3 2 3 4 5
A 2 3 4 3 4
L 1 2 3 4 5
𝑖 # 0 1 2 3 4
# Y E A R
𝑗
ABV-Indian Institute of Information Technology and Management
Gwalior
Lecture : 12
Edit distance (w1,w2)
(Cost of all Edit Operations)
===================================================================================
R 5 4 3 4 3 Substitute
E 4 3 2 3 4 No change
Y 3 2 3 4 5 Delete
A 2 3 4 3 4 Insert
L 1 2 3 4 5 L A Y E * R
𝑖 # 0 1 2 3 4
# Y E A R Y E A R
𝑗
NOTE : There could be more than one path with same cost.
Practice Problem – (Major – 2024)
Time : 5 min
▪ Major-2024 (Solution-3a)
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 2
𝐷 𝑖 − 1, 𝑗 − 1 +3 if X[ i ] ≠ Y[ j ]
+0 if X[ i ] = Y[ j ]
• Obtain 𝐷 𝑛, 𝑚
Practice Problem – (Major – 2024)
Time : 5 min
▪ Major-2024 (Solution) : ANCHOR to ACTOR
R ▪ Initialisation:
• 𝐷 0,0 = 0
O • 𝐷 𝑖, 0 = 2 ∗ 𝑖
• 𝐷 0, 𝑗 = 2 ∗ 𝑗
H
A
𝑖
#
# A C T O R
𝑗
▪ Major-2024 (Solution) : ANCHOR to ACTOR
𝐷 𝑖 − 1, 𝑗 + 2
R 12
𝐷 𝑖, 𝑗 = min 𝐷 𝑖, 𝑗 − 1 + 2
O 10 𝐷 𝑖 − 1, 𝑗 − 1 +3
H 8 +0
C 6
if X[ i ] ≠ Y[ j ]
N 4
if X[ i ] = Y[ j ]
A 2
𝑖
# 0 2 4 6 8 10
# A C T O R
𝑗
▪ Major-2024 (Solution) : ANCHOR to ACTOR
R 12 10 8 9 7 5 Substitute
No change
O 10 8 6 7 5 7
Delete
H 8 6 4 5 7 9
Insert
C 6 4 2 4 6 8
N 4 2 3 5 7 9 A N C H O R
A 2 0 2 4 6 8
𝑖 A * C T O R
# 0 2 4 6 8 10
# A C T O R
𝑗
Minimum Edit Distance as Search
▪ Example – 07 : INTENTION to EXECUTION
N 9
O 8
I 7
T 6
N 5
E 4
T 3
N 2
I 1
# 0 1 2 3 4 5 6 7 8 9
# E X E C U T I O N
Minimum Edit Distance as Search
▪ Example – 07 : INTENTION to EXECUTION PATH-01
Substitute
No change
Delete
Insert
Minimum Edit Distance as Search
▪ Example – 07 : INTENTION to EXECUTION
All Possible Alignments
Xn
• Every non-decreasing path from (0,0)
to (m, n) is corresponds to a possible
alignment between source string to
the target string.
• The optimal alignment is one that has
the minimum cost.
Queries!