🟡 Dynamic Programming: Word Break Problem

stemaway · November 13, 2024, 3:55am

Dynamic Programming: Word Break Problem

Difficulty: Medium

Problem Statement

Given a string s and a dictionary of words wordDict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.

Implementation Details

Implement the following function:

def word_break(s: str, wordDict: List[str]) -> bool:
    """
    Determine if the string s can be segmented into a sequence of one or more dictionary words.

    Args:
        s (str): The input string to be segmented.
        wordDict (List[str]): The list of dictionary words.

    Returns:
        bool: True if s can be segmented into a sequence of one or more dictionary words, False otherwise.
    """
    pass  # Your implementation here

Constraints

String Length: (1 \leq \text{len(s)} \leq 10^4)
Number of Words in Dictionary: (1 \leq \text{len(wordDict)} \leq 10^3)
Word Lengths: All dictionary words have a length between 1 and 20 characters.
String and Dictionary Words: Consist of lowercase English letters ('a' to 'z').

Example Usage

s = "leetcode"
wordDict = ["leet", "code"]
print(word_break(s, wordDict))  # Expected Output: True

s = "applepenapple"
wordDict = ["apple", "pen"]
print(word_break(s, wordDict))  # Expected Output: True

s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
print(word_break(s, wordDict))  # Expected Output: False

Test Cases

def test_word_break():
    # Test Case 1: Simple segmentation
    assert word_break("leetcode", ["leet", "code"]) == True

    # Test Case 2: Multiple segmentations possible
    assert word_break("applepenapple", ["apple", "pen"]) == True

    # Test Case 3: No valid segmentation
    assert word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) == False

    # Test Case 4: Overlapping words
    assert word_break("cars", ["car", "ca", "rs"]) == True

    # Test Case 5: Empty string
    assert word_break("", ["a", "b"]) == True

    # Test Case 6: WordDict with one word
    assert word_break("aaaaaaa", ["aaaa", "aaa"]) == True

    # Test Case 7: String cannot be segmented fully
    assert word_break("aaaaaaa", ["aaaa", "aa"]) == False

    print("All test cases pass!")

Expected Solution Approach

To solve this problem, use Dynamic Programming. The idea is to determine at each index i whether the substring s[0:i] can be segmented into dictionary words.

Algorithm Steps

Initialize DP Array:
- Create a boolean array dp of length len(s) + 1, where dp[i] is True if s[0:i] can be segmented, False otherwise.
- Set dp[0] = True because an empty string can always be segmented.
Iterate Over String Lengths:
- For i from 1 to len(s):
  - For j from 0 to i-1:
    - If dp[j] is True and s[j:i] is in wordDict, set dp[i] = True and break.
Return Result:
- Return dp[len(s)].

Optimizations

Use a Set for wordDict:
- Convert wordDict to a set for O(1) look-up times.
Prune Unnecessary Checks:
- Keep track of the maximum word length in wordDict to limit the inner loop.

Time Complexity

Time Complexity: (O(N^2)), where (N) is the length of the string s.
Space Complexity: (O(N)) for the DP array.

Implementation Hint

Here’s a skeleton to help you start:

def word_break(s: str, wordDict: List[str]) -> bool:
    word_set = set(wordDict)
    max_len = max((len(word) for word in wordDict), default=0)
    dp = [False] * (len(s) + 1)
    dp[0] = True

    for i in range(1, len(s) + 1):
        for j in range(max(0, i - max_len), i):
            if dp[j] and s[j:i] in word_set:
                dp[i] = True
                break

    return dp[len(s)]

Learning Objectives

Understand Dynamic Programming Concepts:
- Break down complex problems into simpler subproblems.
- Recognize overlapping subproblems and optimal substructure.
Efficient String Processing:
- Optimize string operations using sets and limiting iterations.
- Utilize data structures to improve lookup times.
Handle Edge Cases:
- Manage scenarios with empty strings or words.
- Consider overlapping words and prefixes.
Analyze Algorithm Complexity:
- Evaluate time and space complexity.
- Optimize the solution to handle large inputs efficiently.

Real-World Applications

Natural Language Processing (NLP): Tokenize sentences into words for language modeling.
Spell Checking and Correction: Identify possible segmentations of concatenated words.
Input Validation: Verify if user inputs conform to a predefined set of tokens.
Text Segmentation: Break continuous text without spaces into meaningful words, e.g., in languages like Chinese.