🔴 Graph Algorithms: Alien Dictionary

Graph Algorithms: Alien Dictionary

Difficulty: Hard


Problem Statement

You are given a list of words from an alien language, sorted lexicographically according to the alien’s custom alphabet. Your task is to derive the order of letters in this alien language.

Note:

  • If there is no valid order (i.e., the input is invalid), return an empty string.
  • There may be multiple valid orders; any one of them is acceptable.

Implementation Details

Implement the following function:

from typing import List

def alien_order(words: List[str]) -> str:
    """
    Determine the order of letters in an alien language.

    Args:
        words (List[str]): A list of words sorted lexicographically in the alien language.

    Returns:
        str: A string representing the letters in the alien alphabet in order.
             Return an empty string if no valid order exists.
    """
    pass  # Your implementation here

Constraints

  • Number of Words: (1 \leq \text{len(words)} \leq 10^5)
  • Word Length: (1 \leq \text{len(words[i])} \leq 100)
  • Total Characters: The total number of characters across all words does not exceed (10^6).
  • Characters: Words consist of lowercase English letters ('a' to 'z').

Example Usage

words = ["wrt", "wrf", "er", "ett", "rftt"]
print(alien_order(words))  # Expected Output: "wertf"

words = ["z", "x"]
print(alien_order(words))  # Expected Output: "zx"

words = ["z", "x", "z"]
print(alien_order(words))  # Expected Output: "" (No valid order)

Test Cases

def test_alien_order():
    # Test Case 1: Basic ordering
    words1 = ["wrt", "wrf", "er", "ett", "rftt"]
    assert alien_order(words1) == "wertf"

    # Test Case 2: Simple two-letter ordering
    words2 = ["z", "x"]
    assert alien_order(words2) == "zx"

    # Test Case 3: No valid order due to cycle
    words3 = ["z", "x", "z"]
    assert alien_order(words3) == ""

    # Test Case 4: All words are the same
    words4 = ["abc", "abc"]
    assert alien_order(words4) == "abc"

    # Test Case 5: Multiple valid orders
    words5 = ["abc", "ab"]
    assert alien_order(words5) == ""  # Invalid because shorter word comes after prefix

    # Test Case 6: Disconnected letters
    words6 = ["a", "b", "c"]
    order = alien_order(words6)
    assert set(order) == {'a', 'b', 'c'}

    print("All test cases pass!")

Expected Solution Approach

To solve this problem, perform a Topological Sort on the characters using the given word list.

Algorithm Steps

  1. Build the Graph:

    • Create a graph where each node represents a unique character.
    • For each pair of adjacent words, compare them character by character:
      • Find the first differing character.
      • Add a directed edge from the character in the first word to the character in the second word.
      • If the first word is longer and is a prefix of the second word, the input is invalid (e.g., ["abc", "ab"]).
  2. Detect Cycles and Perform Topological Sort:

    • Use Depth-First Search (DFS) to perform a topological sort.
    • Keep track of visited nodes to detect cycles:
      • Use three states: unvisited, visiting, and visited.
      • If a node is encountered that is currently being visited, a cycle exists, and the input is invalid.
  3. Return Result:

    • If the topological sort completes without detecting a cycle, return the characters in reverse order of finishing times.

Time Complexity

  • Time Complexity: (O(C)), where (C) is the total length of all words.
  • Space Complexity: (O(1)) for the graph and visited states (since the number of unique letters is at most 26).

Implementation Hint

Here’s a skeleton to help you start:

from collections import defaultdict

def alien_order(words: List[str]) -> str:
    graph = defaultdict(set)
    in_degree = {}
    for word in words:
        for char in word:
            in_degree[char] = 0

    # Build the graph
    for i in range(len(words) - 1):
        first, second = words[i], words[i+1]
        min_len = min(len(first), len(second))
        if len(first) > len(second) and first.startswith(second):
            return ""
        for j in range(min_len):
            if first[j] != second[j]:
                if second[j] not in graph[first[j]]:
                    graph[first[j]].add(second[j])
                    in_degree[second[j]] += 1
                break

    # Topological sort using BFS
    from collections import deque
    queue = deque([char for char in in_degree if in_degree[char] == 0])
    result = []

    while queue:
        char = queue.popleft()
        result.append(char)
        for neighbor in graph[char]:
            in_degree[neighbor] -=1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)

    if len(result) != len(in_degree):
        return ""
    return "".join(result)

Learning Objectives

  1. Understand Topological Sorting:

    • Apply topological sort to derive ordering from dependencies.
    • Recognize when a cycle invalidates the ordering.
  2. Graph Construction from Data:

    • Build graphs based on relationships inferred from data.
    • Handle special cases like prefix relationships.
  3. Cycle Detection in Graphs:

    • Use DFS or BFS to detect cycles in directed graphs.
    • Understand the implications of cycles on ordering.
  4. Optimize for Large Inputs:

    • Efficiently process large datasets within time and space constraints.
    • Use appropriate data structures like sets and queues.

Real-World Applications

  • Language Processing: Decipher unknown alphabets based on known word orders.
  • Task Scheduling: Order tasks with dependencies that are inferred rather than explicitly stated.
  • Version Control Systems: Determine file dependencies based on usage in commits.
  • Dependency Resolution: Infer module or package order based on usage patterns.