Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
Analysis:
This is a classic DP problem. The reason that DP would reduce time complexity is that this problem has overlapped sub-problems, i.e., given that we found edit distance of length-i words, if we want to further find the edit distance of length-j words, where j>i, we would only need to work on (j-i) portion instead of processing from the very beginning.
We construct a (m+1) x (n+1) 2D matrix with each entry represents a character from each word, where m, n represents the length of each word, the extra 1 space in each dimension is to represent the empty char.
e.g., string word1 = "rabb"; string word2 = "rac", we construct the matrix as follows, matrix entry [i][j] represents the min number of steps to change word1[:i] to word2[:j]
dp = _ | 0 | r | a | b | b |
0 | 0 | 1 | 2 | 3 | 4 |
r | 1 | 0 | 1 | 2 | 3 |
a | 2 | 1 | 0 | 1 | 2 |
c | 3 | 2 | 1 | 1 | 2 |
To recall, the three operations we have are: a) insert; b) delete; c) replace. Each operation costs 1 step.
If two chars are same, i.e., word1[i-1]==word2[j-1], we don't need to update dp[i][j], as dp[i][j] = dp[i-1][j-1];
If two chars are different, we can choose either of the three operations
- if we choose to insert a char into word1, the cost is dp[i][j-1] as after insertion we will compare the first (j-1) chars of word2 with word1, i.e., dp[i][j] = dp[i][j-1] + 1;
- if we choose to delete the current char from word1, the cost is dp[i-1][j] as after deletion we will compare the first (i-1) chars of word1 with word2, i.e., dp[i][j] = dp[i-1][j] + 1;
- if we choose to replace the current char in word1, the cost is dp[i][j] as after replacement we will compare the first (i-1) chars of word1 with the first (j-1) chars of word2, i.e., dp[i][j] = dp[i-1][j-1] + 1;
Then we choose the min cost of these three operations.
This process takes place iteratively through all chars in each words.
Code:
class Solution { public: int minDistance(string word1, string word2) { if (word1==word2) return 0; vector<vector<int> > dp(word1.length()+1, vector<int>(word2.length()+1, 0)); // takes care of first row and first col for (int i=1; i<=word1.length(); i++) dp[i][0]=i; for (int j=1; j<=word2.length(); j++) dp[0][j]=j; // iteration starts for (int i=1; i<=word1.length(); i++) { for (int j=1; j<=word2.length(); j++) { if (word1[i-1]==word2[j-1]) dp[i][j]=dp[i-1][j-1]; else dp[i][j]=min(dp[i][j-1], min(dp[i-1][j], dp[i-1][j-1]))+1; // cost of insert, delete, replace } } return dp[word1.length()][word2.length()]; } };
Obviously, as we need build this 2D table, the time complexity of this algorithm is O(m*n), where m and n are the length of the two words. The space complexity is also O(m*n).
We could use one vector instead of one matrix to store the DP result, this potentially brings space complexity back to linear.
No comments:
Post a Comment