How to Calculate the Levenshtein Distance Between Two Strings in Java Using Recursion?
Last Updated :
23 Feb, 2024
In Java, the Levenshtein Distance Algorithm is a pre-defined method used to measure the similarity between two strings and it can be used to calculate the minimum number of single-character edits (inserts, deletions, or substitutions) required to change one string into another.
Prerequisites:
- Recursion
- Dynamic Programming
- String Manipulation
How Does this Algorithm Work?
First, initialize the 2D array with the size of (m+1) * (n+1) where m and n are the lengths of the two input strings. Check the base cases if any one of the strings is empty then return the length of the other string.
if(len1 != 0 & len2 != 0) then proceed to next steps
After that initialize the first row and column of the 2D array with values representing the number of operation edits required to transform an empty string to the corresponding prefix of the input string. Measure the distances travel through the characters of both strings.
- For each pair of the characters str1[i] and str2[j]
- If the case of str1[i] is equal to str2[j] the operation cost of substitution is 0. else cost is 1.
- Update 2D Array to be the minimum of the below three operations:
- Insertion: Calculate the distance between "Java" and "JavaScrip" by inserting the characters.
- Deletion: Calculate the distance between "Jav" and "JavaScript" by removing the characters.
- Substitution: Calculate the distance between "Jav" and "JavaScrip" by substituting the characters.
Print the minimum cost operations results then the value represents the Levenshtein distance between the two strings.
Sample Program
Java
// Java Program to Calculate the Levenshtein distance
// Between two Strings in Java Using Recursion
public class GfGLevenshteinDistance {
public static int calculateDistance(String str1, String str2) {
return calculateDistanceRecursive(str1, str1.length(), str2, str2.length());
}
private static int calculateDistanceRecursive(String str1, int len1, String str2, int len2) {
// Base cases: if either string is empty,
// return the length of the other string
if (len1 == 0) {
return len2;
}
if (len2 == 0) {
return len1;
}
// If the last characters of the strings are equal,
// No operation is required
if (str1.charAt(len1 - 1) == str2.charAt(len2 - 1)) {
return calculateDistanceRecursive(str1, len1 - 1, str2, len2 - 1);
}
// Calculate cost of three possible operations
// Insertion, Deletion, and Substitution
int insertionCost = calculateDistanceRecursive(str1, len1, str2, len2 - 1);
int deletionCost = calculateDistanceRecursive(str1, len1 - 1, str2, len2);
int substitutionCost = calculateDistanceRecursive(str1, len1 - 1, str2, len2 - 1);
// Return minimum of the three
// costs plus 1 (for the operation)
return 1 + Math.min(Math.min(insertionCost, deletionCost), substitutionCost);
}
public static void main(String[] args) {
String str1 = "Java";
String str2 = "JavaScript";
int distance = calculateDistance(str1, str2);
System.out.println("Levenshtein distance between \"" + str1 + "\" and \"" + str2 + "\" is: " + distance);
}
}
OutputLevenshtein distance between "Java" and "JavaScript" is: 6
Explanation of the above Program:
In the above example, the program calculates the Levenshtein distance between two strings:
- Firstly, Check the base cases If one string is empty, then the distance is equal to the length of the other string.
- Now, check the last characters of the string. If the last character of the string is equal, then operations are not needed. We recursively call the function with the lengths of both strings decremented by 1.
- Here we are calculating the three possible cost operations:
- Insertion: The insertion operation can be used to move to the next character in string2 and keep string1 unchanged.
- Deletion: The deletion operation can be used to move to the next character in string 1 and keep string 2 unchanged.
- Substitution: This operation can be used to move the next character in both string1 and string2.
- After applying the three operations, it returns the minimum of the costs of the three operations plus 1.
- Now, the calculateDistance() method will be called with strings 1 and 2, and it will recursively calculate the Levenshtein distance between the two strings.
Similar Reads
How to Find the Longest Common Prefix of Two Strings in Java? In this article, we will find the longest common prefix of two Strings in Java. Examples: Input: String 1= geeksforgeeks, String 2 = geezerOutput: âgeeâ Input: String 1= flower, String 2 = flightOutput: âflâ Methods to Find the Longest common Prefix of two Strings in JavaBelow are the methods by whi
4 min read
How to Replace the Last Occurrence of a Substring in a String in Java? In this article, we will learn about replacing the last instance of a certain substring inside a string as a typical need. We'll look at a practical Java solution for this in this post. Replace the last occurrence of a Substring in a String in JavaWe may use the lastIndexOf() method to determine the
2 min read
Java Program to Implement Levenshtein Distance Computing Algorithm The Levenshtein distance also called the Edit distance, is the minimum number of operations required to transform one string to another. Typically, three types of operations are performed (one at a time) : Replace a character.Delete a character.Insert a character. Examples: Input: str1 = "glomax", s
6 min read
Java Program To Find Length Of The Longest Substring Without Repeating Characters Given a string str, find the length of the longest substring without repeating characters. For âABDEFGABEFâ, the longest substring are âBDEFGAâ and "DEFGAB", with length 6.For âBBBBâ the longest substring is âBâ, with length 1.For "GEEKSFORGEEKS", there are two longest substrings shown in the below
8 min read
Java Program to Count the Total Number of Vowels and Consonants in a String Given a String count the total number of vowels and consonants in this given string. Assuming String may contain only special characters, or white spaces, or a combination of all. The idea is to iterate the string and checks if that character is present in the reference string or not. If a character
2 min read