Graph Algorithms: Alien Dictionary
Difficulty: Hard
Problem Statement
You are given a list of words from an alien language, sorted lexicographically according to the alien’s custom alphabet. Your task is to derive the order of letters in this alien language.
Note:
- If there is no valid order (i.e., the input is invalid), return an empty string.
- There may be multiple valid orders; any one of them is acceptable.
Implementation Details
Implement the following function:
from typing import List
def alien_order(words: List[str]) -> str:
"""
Determine the order of letters in an alien language.
Args:
words (List[str]): A list of words sorted lexicographically in the alien language.
Returns:
str: A string representing the letters in the alien alphabet in order.
Return an empty string if no valid order exists.
"""
pass # Your implementation here
Constraints
- Number of Words: (1 \leq \text{len(words)} \leq 10^5)
- Word Length: (1 \leq \text{len(words[i])} \leq 100)
- Total Characters: The total number of characters across all words does not exceed (10^6).
- Characters: Words consist of lowercase English letters (
'a'
to'z'
).
Example Usage
words = ["wrt", "wrf", "er", "ett", "rftt"]
print(alien_order(words)) # Expected Output: "wertf"
words = ["z", "x"]
print(alien_order(words)) # Expected Output: "zx"
words = ["z", "x", "z"]
print(alien_order(words)) # Expected Output: "" (No valid order)
Test Cases
def test_alien_order():
# Test Case 1: Basic ordering
words1 = ["wrt", "wrf", "er", "ett", "rftt"]
assert alien_order(words1) == "wertf"
# Test Case 2: Simple two-letter ordering
words2 = ["z", "x"]
assert alien_order(words2) == "zx"
# Test Case 3: No valid order due to cycle
words3 = ["z", "x", "z"]
assert alien_order(words3) == ""
# Test Case 4: All words are the same
words4 = ["abc", "abc"]
assert alien_order(words4) == "abc"
# Test Case 5: Multiple valid orders
words5 = ["abc", "ab"]
assert alien_order(words5) == "" # Invalid because shorter word comes after prefix
# Test Case 6: Disconnected letters
words6 = ["a", "b", "c"]
order = alien_order(words6)
assert set(order) == {'a', 'b', 'c'}
print("All test cases pass!")
Expected Solution Approach
To solve this problem, perform a Topological Sort on the characters using the given word list.
Algorithm Steps
-
Build the Graph:
- Create a graph where each node represents a unique character.
- For each pair of adjacent words, compare them character by character:
- Find the first differing character.
- Add a directed edge from the character in the first word to the character in the second word.
- If the first word is longer and is a prefix of the second word, the input is invalid (e.g.,
["abc", "ab"]
).
-
Detect Cycles and Perform Topological Sort:
- Use Depth-First Search (DFS) to perform a topological sort.
- Keep track of visited nodes to detect cycles:
- Use three states: unvisited, visiting, and visited.
- If a node is encountered that is currently being visited, a cycle exists, and the input is invalid.
-
Return Result:
- If the topological sort completes without detecting a cycle, return the characters in reverse order of finishing times.
Time Complexity
- Time Complexity: (O(C)), where (C) is the total length of all words.
- Space Complexity: (O(1)) for the graph and visited states (since the number of unique letters is at most 26).
Implementation Hint
Here’s a skeleton to help you start:
from collections import defaultdict
def alien_order(words: List[str]) -> str:
graph = defaultdict(set)
in_degree = {}
for word in words:
for char in word:
in_degree[char] = 0
# Build the graph
for i in range(len(words) - 1):
first, second = words[i], words[i+1]
min_len = min(len(first), len(second))
if len(first) > len(second) and first.startswith(second):
return ""
for j in range(min_len):
if first[j] != second[j]:
if second[j] not in graph[first[j]]:
graph[first[j]].add(second[j])
in_degree[second[j]] += 1
break
# Topological sort using BFS
from collections import deque
queue = deque([char for char in in_degree if in_degree[char] == 0])
result = []
while queue:
char = queue.popleft()
result.append(char)
for neighbor in graph[char]:
in_degree[neighbor] -=1
if in_degree[neighbor] == 0:
queue.append(neighbor)
if len(result) != len(in_degree):
return ""
return "".join(result)
Learning Objectives
-
Understand Topological Sorting:
- Apply topological sort to derive ordering from dependencies.
- Recognize when a cycle invalidates the ordering.
-
Graph Construction from Data:
- Build graphs based on relationships inferred from data.
- Handle special cases like prefix relationships.
-
Cycle Detection in Graphs:
- Use DFS or BFS to detect cycles in directed graphs.
- Understand the implications of cycles on ordering.
-
Optimize for Large Inputs:
- Efficiently process large datasets within time and space constraints.
- Use appropriate data structures like sets and queues.
Real-World Applications
- Language Processing: Decipher unknown alphabets based on known word orders.
- Task Scheduling: Order tasks with dependencies that are inferred rather than explicitly stated.
- Version Control Systems: Determine file dependencies based on usage in commits.
- Dependency Resolution: Infer module or package order based on usage patterns.