Friday, May 16, 2014

[LeetCode] Minimum Window Substring

Problem Statement (link):
Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
For example,
S = "ADOBECODEBANC"
T = "ABC"
Minimum window is "BANC".
Note:
If there is no such window in S that covers all characters in T, return the emtpy string "".
If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.
Analysis:
We use two index pointers to maintain a sub-string window that contains the chars in T. Two hash tables are used, one contains all the chars in T as key field and their occurrences as value field; the other one is dynamic hash table that maintains chars in the sub-string window and their occurrences, these chars are only those appear in T. Further, an int is used to keep track of the number of matching chars in the sub-string window. This int is important as it closely associates with the movement of the two index pointers.

The pseudo-code goes like this:
Initialize parameters, two indexes == 0
Scan through T to update needtoFind hash table;
Start from the beginning of S, for each end value:
    if current char is not in T, continue;
    otherwise, add current char to hasFound hash table;
    if the occurrences of current char in hasFound is no larger than that in needtoFind, increase count;

    if count equals T size:
        if current char does not exist in T or its occurrences in hasFound is larger than that in needtoFind, we advance begin pointer, meanwhile, if current char exists in T and hasFound has larger occurrences, decrease the value in hasFound
        if current window is smaller, update the minimum string and its size

To better understand the algorithm, e.g., S = "DOBECAODEBAANC", T = "AABC", the red chars represent the dynamic window.

S
Variables

DOBECAODEBAANC
begin = 0, end = 0

DOBECAODEBAANC
begin = 0, end = 10Advance end until find a match
DOBECAODEBAANC
begin = 5, end = 10Advance begin 
DOBECAODEBAANC
begin = 5, end = 11count stays the same, advance end, increase hasFound['A']
DOBECAODEBAANC
begin = 5, end = 13Advance end, increase hasFound['C']
DOBECAODEBAANC
begin = 6, end = 13Advance begin, decrease hasFound['C']
DOBECAODEBAANCbegin = 9, end = 13Advance begin, decrease hasFound['A']

Take-aways:
In C++11, we could manipulate the nonexistent keys in unordered_map directly without initialize that key value, and the initial value of that key is 0. For instance, the following code is legal:

unordered_map<char, int> map;
int val = map['a'];   // val==0
map['b']--;     // map['b']==-1
int sz = map.size();   // sz==2

Code:
class Solution {
public:
    string minWindow(string S, string T) {
        int begin=0, end=0;  // for scanning S
        int m=INT_MAX;  // min window size
        string ms="";      // min string
        int count=0;    // matching num of chars in current window

        unordered_map<char, int> hasFound;      // for string S
        unordered_map<char, int> needtoFind;    // for string T

        for (int i=0; i<T.size(); i++)
            needtoFind[T[i]]++;

        for (int end=0; end<S.size(); end++) {
            // skip chars not in T
            if (needtoFind.find(S[end])==needtoFind.end()) continue;
            // o.w. add char into hasFound table
            hasFound[S[end]]++;
            if (hasFound[S[end]] <= needtoFind[S[end]]) count++;

            if (count==T.size()) {
                // advance begin pointer
                while (needtoFind.find(S[begin])==needtoFind.end() || hasFound[S[begin]]>needtoFind[S[begin]]) {
                    if (needtoFind.find(S[begin])!=needtoFind.end() && hasFound[S[begin]]>needtoFind[S[begin]]) hasFound[S[begin]]--;
                    begin++;
                }

                // update min string
                int winLen=end-begin+1;
                if (winLen<m) {
                    m=winLen;
                    ms=S.substr(begin, winLen);
                }
            }
        }
        return ms;
    }
};



No comments:

Post a Comment