🔴 Dynamic Programming: Regular Expression Matching

Dynamic Programming: Regular Expression Matching

Difficulty: Hard


Problem Statement

Implement regular expression matching with support for '.' and '*'.

Given an input string s and a pattern p, implement a function to determine if s matches the pattern p.

  • '.' Matches any single character.
  • '*' Matches zero or more of the preceding element.

Note:

  • The matching should cover the entire input string (not partial).
  • The pattern p does not contain empty character classes like * without a preceding character.

Implementation Details

Implement the following function:

def is_match(s: str, p: str) -> bool:
    """
    Determine if the string s matches the pattern p.

    Args:
        s (str): The input string.
        p (str): The pattern containing '.' and '*'.

    Returns:
        bool: True if s matches p, False otherwise.
    """
    pass  # Your implementation here

Constraints

  • String Lengths: (0 \leq \text{len(s)} \leq 20)
  • Pattern Lengths: (0 \leq \text{len(p)} \leq 30)
  • Characters: s contains only lowercase English letters. p contains lowercase English letters, . and *.
  • Valid Patterns: Patterns are well-formed; no consecutive *, and * is always preceded by a character or ..

Example Usage

print(is_match("aa", "a"))          # Expected Output: False
print(is_match("aa", "a*"))         # Expected Output: True
print(is_match("ab", ".*"))         # Expected Output: True
print(is_match("aab", "c*a*b"))     # Expected Output: True
print(is_match("mississippi", "mis*is*p*."))  # Expected Output: False

Test Cases

def test_is_match():
    # Test Case 1: Simple mismatch
    assert is_match("aa", "a") == False

    # Test Case 2: '*' matches multiple preceding characters
    assert is_match("aa", "a*") == True

    # Test Case 3: '.' matches any character
    assert is_match("ab", ".*") == True

    # Test Case 4: Complex pattern with '*' and '.'
    assert is_match("aab", "c*a*b") == True

    # Test Case 5: Mismatch due to pattern
    assert is_match("mississippi", "mis*is*p*.") == False

    # Test Case 6: Empty string and pattern
    assert is_match("", "") == True

    # Test Case 7: Empty string with non-empty pattern
    assert is_match("", "a*") == True

    # Test Case 8: Non-empty string with empty pattern
    assert is_match("a", "") == False

    # Test Case 9: '*' cannot stand alone
    assert is_match("a", "*") == False

    # Test Case 10: Pattern ends with '*'
    assert is_match("ab", ".*c") == False

    print("All test cases pass!")

Expected Solution Approach

Use Dynamic Programming to solve this problem by considering the subproblems of matching substrings of s and p.

Algorithm Steps

  1. Initialize DP Table:

    • Create a 2D boolean table dp, where dp[i][j] represents whether s[0:i] matches p[0:j].
    • Initialize dp[0][0] = True, as empty strings match.
    • Initialize first row for patterns like a* or a*b*.
  2. Fill DP Table:

    • Iterate over the string s and pattern p.
    • For each position i in s and j in p:
      • If characters match or pattern has ‘.’:
        • dp[i][j] = dp[i-1][j-1]
      • If pattern has ‘*’:
        • Zero occurrence: dp[i][j] = dp[i][j-2]
        • One or more occurrences:
          • If the preceding character in pattern matches current character in string or is ‘.’:
            • dp[i][j] = dp[i-1][j]
      • Else:
        • dp[i][j] = False
  3. Return Result:

    • Return dp[len(s)][len(p)].

Time Complexity

  • Time Complexity: (O(N \times M)), where (N) is the length of s and (M) is the length of p.
  • Space Complexity: (O(N \times M)) for the DP table.

Implementation Hint

Here’s a skeleton to help you start:

def is_match(s: str, p: str) -> bool:
    n, m = len(s), len(p)
    dp = [[False]*(m+1) for _ in range(n+1)]
    dp[0][0] = True

    # Initialize first row for patterns like a*, a*b*, etc.
    for j in range(2, m+1):
        if p[j-1] == '*':
            dp[0][j] = dp[0][j-2]

    for i in range(1, n+1):
        for j in range(1, m+1):
            if p[j-1] == '.' or p[j-1] == s[i-1]:
                dp[i][j] = dp[i-1][j-1]
            elif p[j-1] == '*':
                dp[i][j] = dp[i][j-2]  # Zero occurrence
                if p[j-2] == '.' or p[j-2] == s[i-1]:
                    dp[i][j] |= dp[i-1][j]  # One or more occurrences
            else:
                dp[i][j] = False

    return dp[n][m]

Learning Objectives

  1. Master Dynamic Programming Techniques:

    • Break down complex matching into manageable subproblems.
    • Use DP tables to store intermediate results.
  2. Understand Regular Expressions:

    • Implement basic regex functionality.
    • Handle special characters like . and *.
  3. Handle Edge Cases in String Matching:

    • Manage empty strings and patterns.
    • Ensure proper initialization of DP table.
  4. Optimize Space Complexity (Optional):

    • Reduce space usage from (O(N \times M)) to (O(M)) by reusing previous row data.

Real-World Applications

  • Text Editors: Implement search functionality with wildcard patterns.
  • Compiler Design: Pattern matching in lexical analysis.
  • Data Validation: Validate input strings against patterns.
  • Networking: Filter packets based on pattern matching in firewalls.