Dynamic Programming: Word Break Problem
Difficulty: Medium
Problem Statement
Given a string s
and a dictionary of words wordDict
, determine if s
can be segmented into a space-separated sequence of one or more dictionary words.
Implementation Details
Implement the following function:
def word_break(s: str, wordDict: List[str]) -> bool:
"""
Determine if the string s can be segmented into a sequence of one or more dictionary words.
Args:
s (str): The input string to be segmented.
wordDict (List[str]): The list of dictionary words.
Returns:
bool: True if s can be segmented into a sequence of one or more dictionary words, False otherwise.
"""
pass # Your implementation here
Constraints
- String Length: (1 \leq \text{len(s)} \leq 10^4)
- Number of Words in Dictionary: (1 \leq \text{len(wordDict)} \leq 10^3)
- Word Lengths: All dictionary words have a length between 1 and 20 characters.
- String and Dictionary Words: Consist of lowercase English letters (
'a'
to'z'
).
Example Usage
s = "leetcode"
wordDict = ["leet", "code"]
print(word_break(s, wordDict)) # Expected Output: True
s = "applepenapple"
wordDict = ["apple", "pen"]
print(word_break(s, wordDict)) # Expected Output: True
s = "catsandog"
wordDict = ["cats", "dog", "sand", "and", "cat"]
print(word_break(s, wordDict)) # Expected Output: False
Test Cases
def test_word_break():
# Test Case 1: Simple segmentation
assert word_break("leetcode", ["leet", "code"]) == True
# Test Case 2: Multiple segmentations possible
assert word_break("applepenapple", ["apple", "pen"]) == True
# Test Case 3: No valid segmentation
assert word_break("catsandog", ["cats", "dog", "sand", "and", "cat"]) == False
# Test Case 4: Overlapping words
assert word_break("cars", ["car", "ca", "rs"]) == True
# Test Case 5: Empty string
assert word_break("", ["a", "b"]) == True
# Test Case 6: WordDict with one word
assert word_break("aaaaaaa", ["aaaa", "aaa"]) == True
# Test Case 7: String cannot be segmented fully
assert word_break("aaaaaaa", ["aaaa", "aa"]) == False
print("All test cases pass!")
Expected Solution Approach
To solve this problem, use Dynamic Programming. The idea is to determine at each index i
whether the substring s[0:i]
can be segmented into dictionary words.
Algorithm Steps
-
Initialize DP Array:
- Create a boolean array
dp
of lengthlen(s) + 1
, wheredp[i]
isTrue
ifs[0:i]
can be segmented,False
otherwise. - Set
dp[0] = True
because an empty string can always be segmented.
- Create a boolean array
-
Iterate Over String Lengths:
- For
i
from 1 tolen(s)
:- For
j
from 0 toi-1
:- If
dp[j]
isTrue
ands[j:i]
is inwordDict
, setdp[i] = True
and break.
- If
- For
- For
-
Return Result:
- Return
dp[len(s)]
.
- Return
Optimizations
- Use a Set for wordDict:
- Convert
wordDict
to a set for O(1) look-up times.
- Convert
- Prune Unnecessary Checks:
- Keep track of the maximum word length in
wordDict
to limit the inner loop.
- Keep track of the maximum word length in
Time Complexity
- Time Complexity: (O(N^2)), where (N) is the length of the string
s
. - Space Complexity: (O(N)) for the DP array.
Implementation Hint
Here’s a skeleton to help you start:
def word_break(s: str, wordDict: List[str]) -> bool:
word_set = set(wordDict)
max_len = max((len(word) for word in wordDict), default=0)
dp = [False] * (len(s) + 1)
dp[0] = True
for i in range(1, len(s) + 1):
for j in range(max(0, i - max_len), i):
if dp[j] and s[j:i] in word_set:
dp[i] = True
break
return dp[len(s)]
Learning Objectives
-
Understand Dynamic Programming Concepts:
- Break down complex problems into simpler subproblems.
- Recognize overlapping subproblems and optimal substructure.
-
Efficient String Processing:
- Optimize string operations using sets and limiting iterations.
- Utilize data structures to improve lookup times.
-
Handle Edge Cases:
- Manage scenarios with empty strings or words.
- Consider overlapping words and prefixes.
-
Analyze Algorithm Complexity:
- Evaluate time and space complexity.
- Optimize the solution to handle large inputs efficiently.
Real-World Applications
- Natural Language Processing (NLP): Tokenize sentences into words for language modeling.
- Spell Checking and Correction: Identify possible segmentations of concatenated words.
- Input Validation: Verify if user inputs conform to a predefined set of tokens.
- Text Segmentation: Break continuous text without spaces into meaningful words, e.g., in languages like Chinese.