Tuesday, April 8, 2014

[LeetCode] Word Break II

Problem Statement (link):
Given a string s and a dictionary of words dict, add spaces in s to construct a sentence where each word is a valid dictionary word.
Return all such possible sentences.
For example, given
s = "catsanddog",
dict = ["cat", "cats", "and", "sand", "dog"].
A solution is ["cats and dog", "cat sand dog"].
Analysis:
This problem requires us to return all the possible combinations with dictionary words.

The first idea is to implement a recursive DFS algorithm. In which, consider each possible prefix as a node in a n-nary tree, where n is the possible word choices starting from next index. For example, in the given example, if we construct a tree like:

                                                              ""               --> NULL string as root node
                                                          /        \
                                                   "cat"         "cats"
                                                      /                \
                                             "sand"                "and"    
                                                  /                        \
                                           "dog"                       "dog"

At each node, we need to search the entire dictionary for next possible word (child nodes).

The time complexity of this algorithm is O(m*n^2) in worst case, where n is length of string s, and m is the length of the dictionary.

I implemented this algorithm in Sol 1. However, it got TLE from OJ. How come?

If we look into the algorithm carefully, we could see there are two places that we may improve:
1) Dictionary look up duplication. In the given example, we could see that the two nodes "dog" at the last level are same. However, the DFS will do a dictionary scan each time;
2) Un-necessary look up. Suppose we had a string/sub-string s = "Leetcode", even if dictionary has no word "L" or words starting with "L", the algorithm will search next letter "e" as well.

We could borrow the DP idea from Word Break I to improve the 2) problem, where dp[i] indicates if s[i : n-1] could be constructed by dictionary words. If not, we stop search search immediately and move on. This idea is called backtracking, the dp vector here serves as the stop condition in backtracking.

The dp index map is as follows:

string:         c  a  t  s  a  n  d  d  o  g
i:                 0  1  2  3  4  5  6  7  8  9
dp:              0  1  2  3  4  5  6  7  8  9  10 --> dp[10]==true serving as the initial condition

However, I was not able to solve 1) problem.

To sum up, the time complexity of the improved algorithm is still O(m*n^2), but it saves a lot of time by stop searching earlier according to the pre-constructed dp vector.

Code:
Sol 1 - Recursive DFS:
vector<string> wordBreak(string s, unordered_set<string> &dict) {
    vector<string> dp;  // Store the out sequence
    recur(dict, dp, s, "");
    return dp;
}
void recur(unordered_set<string>& dict, vector<string>& dp, string s, string res){
    for (int i=1; i<=s.length(); i++){   // length of prefix
        if (dict.find(s.substr(0, i))!=dict.end()) {
            if (i==s.length()) {
                res+=s.substr(0, i);
                dp.push_back(res);
                return;
            }
            recur(dict, dp, s.substr(i, s.length()-i), res+s.substr(0,i)+" ");
        }
    }
    return;
}

Sol 2 - DFS + Backtracking (realized by DP):
class Solution {
public:
    // recursion + dp
    vector<string> wordBreak(string s, unordered_set<string> &dict){
        int len = s.length();
        vector<string> out;
        vector<bool> dp(len+1, false);
        // indicates if s[i, n-1] can be represented by dict
        dp[len]=true;
        for (int i=len-1; i>=0; i--) {
            if (dict.find(s.substr(i, len))!=dict.end()) {
                dp[i]=true;
                continue;
            }
            for (int j=i+1; j<len; j++) {
                if (dp[j]==true && dict.find(s.substr(i,j-i))!=dict.end()) {
                    dp[i]=true;
                }
            }
        }
        // dp + recursion
        recur(dict, dp, out, s, "", 0);
        return out;
    }

    // st-current substr start index
    void recur(unordered_set<string>& dict, vector<bool>& dp, vector<string>& out, string s, string res, int st) {
        for (int i=1; i<=s.length(); i++) {
            if (dict.find(s.substr(0, i))!=dict.end() && dp[st]==true) {
                if (i==s.length()) {
                    res+=s.substr(0, i);
                    out.push_back(res);
                    return;
                }
                recur(dict, dp, out, s.substr(i,s.length()-i), res+s.substr(0, i)+" ", st+i);
            }
        }
    }
}

Take-aways:
- Consider DFS when asked to "find all", "return all possible".
- Consider backtracking to save time - what would be the stop condition?


No comments:

Post a Comment