Dynamic Programming: Regular Expression Matching
Difficulty: Hard
Problem Statement
Implement regular expression matching with support for '.'
and '*'
.
Given an input string s
and a pattern p
, implement a function to determine if s
matches the pattern p
.
'.'
Matches any single character.'*'
Matches zero or more of the preceding element.
Note:
- The matching should cover the entire input string (not partial).
- The pattern
p
does not contain empty character classes like*
without a preceding character.
Implementation Details
Implement the following function:
def is_match(s: str, p: str) -> bool:
"""
Determine if the string s matches the pattern p.
Args:
s (str): The input string.
p (str): The pattern containing '.' and '*'.
Returns:
bool: True if s matches p, False otherwise.
"""
pass # Your implementation here
Constraints
- String Lengths: (0 \leq \text{len(s)} \leq 20)
- Pattern Lengths: (0 \leq \text{len(p)} \leq 30)
- Characters:
s
contains only lowercase English letters.p
contains lowercase English letters,.
and*
. - Valid Patterns: Patterns are well-formed; no consecutive
*
, and*
is always preceded by a character or.
.
Example Usage
print(is_match("aa", "a")) # Expected Output: False
print(is_match("aa", "a*")) # Expected Output: True
print(is_match("ab", ".*")) # Expected Output: True
print(is_match("aab", "c*a*b")) # Expected Output: True
print(is_match("mississippi", "mis*is*p*.")) # Expected Output: False
Test Cases
def test_is_match():
# Test Case 1: Simple mismatch
assert is_match("aa", "a") == False
# Test Case 2: '*' matches multiple preceding characters
assert is_match("aa", "a*") == True
# Test Case 3: '.' matches any character
assert is_match("ab", ".*") == True
# Test Case 4: Complex pattern with '*' and '.'
assert is_match("aab", "c*a*b") == True
# Test Case 5: Mismatch due to pattern
assert is_match("mississippi", "mis*is*p*.") == False
# Test Case 6: Empty string and pattern
assert is_match("", "") == True
# Test Case 7: Empty string with non-empty pattern
assert is_match("", "a*") == True
# Test Case 8: Non-empty string with empty pattern
assert is_match("a", "") == False
# Test Case 9: '*' cannot stand alone
assert is_match("a", "*") == False
# Test Case 10: Pattern ends with '*'
assert is_match("ab", ".*c") == False
print("All test cases pass!")
Expected Solution Approach
Use Dynamic Programming to solve this problem by considering the subproblems of matching substrings of s
and p
.
Algorithm Steps
-
Initialize DP Table:
- Create a 2D boolean table
dp
, wheredp[i][j]
represents whethers[0:i]
matchesp[0:j]
. - Initialize
dp[0][0] = True
, as empty strings match. - Initialize first row for patterns like
a*
ora*b*
.
- Create a 2D boolean table
-
Fill DP Table:
- Iterate over the string
s
and patternp
. - For each position
i
ins
andj
inp
:- If characters match or pattern has ‘.’:
dp[i][j] = dp[i-1][j-1]
- If pattern has ‘*’:
- Zero occurrence:
dp[i][j] = dp[i][j-2]
- One or more occurrences:
- If the preceding character in pattern matches current character in string or is ‘.’:
dp[i][j] = dp[i-1][j]
- If the preceding character in pattern matches current character in string or is ‘.’:
- Zero occurrence:
- Else:
dp[i][j] = False
- If characters match or pattern has ‘.’:
- Iterate over the string
-
Return Result:
- Return
dp[len(s)][len(p)]
.
- Return
Time Complexity
- Time Complexity: (O(N \times M)), where (N) is the length of
s
and (M) is the length ofp
. - Space Complexity: (O(N \times M)) for the DP table.
Implementation Hint
Here’s a skeleton to help you start:
def is_match(s: str, p: str) -> bool:
n, m = len(s), len(p)
dp = [[False]*(m+1) for _ in range(n+1)]
dp[0][0] = True
# Initialize first row for patterns like a*, a*b*, etc.
for j in range(2, m+1):
if p[j-1] == '*':
dp[0][j] = dp[0][j-2]
for i in range(1, n+1):
for j in range(1, m+1):
if p[j-1] == '.' or p[j-1] == s[i-1]:
dp[i][j] = dp[i-1][j-1]
elif p[j-1] == '*':
dp[i][j] = dp[i][j-2] # Zero occurrence
if p[j-2] == '.' or p[j-2] == s[i-1]:
dp[i][j] |= dp[i-1][j] # One or more occurrences
else:
dp[i][j] = False
return dp[n][m]
Learning Objectives
-
Master Dynamic Programming Techniques:
- Break down complex matching into manageable subproblems.
- Use DP tables to store intermediate results.
-
Understand Regular Expressions:
- Implement basic regex functionality.
- Handle special characters like
.
and*
.
-
Handle Edge Cases in String Matching:
- Manage empty strings and patterns.
- Ensure proper initialization of DP table.
-
Optimize Space Complexity (Optional):
- Reduce space usage from (O(N \times M)) to (O(M)) by reusing previous row data.
Real-World Applications
- Text Editors: Implement search functionality with wildcard patterns.
- Compiler Design: Pattern matching in lexical analysis.
- Data Validation: Validate input strings against patterns.
- Networking: Filter packets based on pattern matching in firewalls.