Friday, April 4, 2014

[LeetCode] Edit Distance

Problem statement (link here):

Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character

Analysis:
This is a classic DP problem. The reason that DP would reduce time complexity is that this problem has overlapped sub-problems, i.e., given that we found edit distance of length-i words, if we want to further find the edit distance of length-j words, where j>i, we would only need to work on (j-i) portion instead of processing from the very beginning.

We construct a (m+1) x (n+1) 2D matrix with each entry represents a character from each word, where m, n represents the length of each word, the extra 1 space in each dimension is to represent the empty char.

e.g., string word1 = "rabb"; string word2 = "rac", we construct the matrix as follows, matrix entry [i][j] represents the min number of steps to change word1[:i] to word2[:j]

       dp =    _ | 0 | r  | a | b | b |
                  0 | 0 | 1 | 2 | 3 | 4 |
                  r  | 1 | 0 | 1 | 2 | 3 |
                  a | 2 | 1 | 0 | 1 | 2 |
                  c | 3 | 2 | 1 | 1 | 2 |

To recall, the three operations we have are: a) insert; b) delete; c) replace. Each operation costs 1 step.

If two chars are same, i.e., word1[i-1]==word2[j-1], we don't need to update dp[i][j], as dp[i][j] = dp[i-1][j-1];
If two chars are different, we can choose either of the three operations
- if we choose to insert a char into word1, the cost is dp[i][j-1] as after insertion we will compare the first (j-1) chars of word2 with word1, i.e., dp[i][j] = dp[i][j-1] + 1;
- if we choose to delete the current char from word1, the cost is dp[i-1][j] as after deletion we will compare the first (i-1) chars of word1 with word2, i.e., dp[i][j] = dp[i-1][j] + 1;
- if we choose to replace the current char in word1, the cost is dp[i][j] as after replacement we will compare the first (i-1) chars of word1 with the first (j-1) chars of word2, i.e., dp[i][j] = dp[i-1][j-1] + 1;
Then we choose the min cost of these three operations.

This process takes place iteratively through all chars in each words.

Code:
class Solution {
public:
    int minDistance(string word1, string word2) {
        if (word1==word2) return 0;
        vector<vector<int> > dp(word1.length()+1, vector<int>(word2.length()+1, 0));
        // takes care of first row and first col
        for (int i=1; i<=word1.length(); i++)
            dp[i][0]=i;
        for (int j=1; j<=word2.length(); j++)
            dp[0][j]=j;
        // iteration starts
        for (int i=1; i<=word1.length(); i++) {
            for (int j=1; j<=word2.length(); j++) {
                if (word1[i-1]==word2[j-1])
                    dp[i][j]=dp[i-1][j-1];
                else
                    dp[i][j]=min(dp[i][j-1], min(dp[i-1][j], dp[i-1][j-1]))+1;    // cost of insert, delete, replace
            }
        }
        return dp[word1.length()][word2.length()];
    }
};

Obviously, as we need build this 2D table, the time complexity of this algorithm is O(m*n), where m and n are the length of the two words. The space complexity is also O(m*n).

We could use one vector instead of one matrix to store the DP result, this potentially brings space complexity back to linear.

No comments:

Post a Comment