🟡 Dynamic Programming: Word Break Problem

Dynamic Programming: Word Break Problem

Difficulty: Medium


Problem Statement

Given a string s and a dictionary of words wordDict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.


Implementation Details

Implement the following function:

def word_break(s: str, wordDict: List[str]) -> bool:
    """
    Determine if the string s can be segmented into a sequence of one or more dictionary words.

    Args:
        s (str): The input string to be segmented.
        wordDict (List[str]): The list of dictionary words.

    Returns:
        bool: True if s can be segmented into a sequence of one or more dictionary words, False otherwise.
    """
    pass  # Your implementation here

Constraints

  • String Length: (1 \leq \text{len(s)} \leq 10^4)
  • Number of Words in Dictionary: (1 \leq \text{len(wordDict)} \leq 10^3)
  • Word Lengths: All dictionary words have a length between 1 and 20 characters.
  • String and Dictionary Words: Consist of lowercase English letters ('a' to 'z').

Example Usage

s = "leetcode"
wordDict = ["leet", "code"]
print(word_break(s, wordDict))  # Expected Output: True

s = "applepenapple"
wordDict = ["apple", "pen"]
print(word_break(s, wordDict))  # Expected Output: True

s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
print(word_break(s, wordDict))  # Expected Output: False

Test Cases

def test_word_break():
    # Test Case 1: Simple segmentation
    assert word_break("leetcode", ["leet", "code"]) == True

    # Test Case 2: Multiple segmentations possible
    assert word_break("applepenapple", ["apple", "pen"]) == True

    # Test Case 3: No valid segmentation
    assert word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) == False

    # Test Case 4: Overlapping words
    assert word_break("cars", ["car", "ca", "rs"]) == True

    # Test Case 5: Empty string
    assert word_break("", ["a", "b"]) == True

    # Test Case 6: WordDict with one word
    assert word_break("aaaaaaa", ["aaaa", "aaa"]) == True

    # Test Case 7: String cannot be segmented fully
    assert word_break("aaaaaaa", ["aaaa", "aa"]) == False

    print("All test cases pass!")

Expected Solution Approach

To solve this problem, use Dynamic Programming. The idea is to determine at each index i whether the substring s[0:i] can be segmented into dictionary words.

Algorithm Steps

  1. Initialize DP Array:

    • Create a boolean array dp of length len(s) + 1, where dp[i] is True if s[0:i] can be segmented, False otherwise.
    • Set dp[0] = True because an empty string can always be segmented.
  2. Iterate Over String Lengths:

    • For i from 1 to len(s):
      • For j from 0 to i-1:
        • If dp[j] is True and s[j:i] is in wordDict, set dp[i] = True and break.
  3. Return Result:

    • Return dp[len(s)].

Optimizations

  • Use a Set for wordDict:
    • Convert wordDict to a set for O(1) look-up times.
  • Prune Unnecessary Checks:
    • Keep track of the maximum word length in wordDict to limit the inner loop.

Time Complexity

  • Time Complexity: (O(N^2)), where (N) is the length of the string s.
  • Space Complexity: (O(N)) for the DP array.

Implementation Hint

Here’s a skeleton to help you start:

def word_break(s: str, wordDict: List[str]) -> bool:
    word_set = set(wordDict)
    max_len = max((len(word) for word in wordDict), default=0)
    dp = [False] * (len(s) + 1)
    dp[0] = True

    for i in range(1, len(s) + 1):
        for j in range(max(0, i - max_len), i):
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[len(s)]

Learning Objectives

  1. Understand Dynamic Programming Concepts:

    • Break down complex problems into simpler subproblems.
    • Recognize overlapping subproblems and optimal substructure.
  2. Efficient String Processing:

    • Optimize string operations using sets and limiting iterations.
    • Utilize data structures to improve lookup times.
  3. Handle Edge Cases:

    • Manage scenarios with empty strings or words.
    • Consider overlapping words and prefixes.
  4. Analyze Algorithm Complexity:

    • Evaluate time and space complexity.
    • Optimize the solution to handle large inputs efficiently.

Real-World Applications

  • Natural Language Processing (NLP): Tokenize sentences into words for language modeling.
  • Spell Checking and Correction: Identify possible segmentations of concatenated words.
  • Input Validation: Verify if user inputs conform to a predefined set of tokens.
  • Text Segmentation: Break continuous text without spaces into meaningful words, e.g., in languages like Chinese.