Aho-Corasick
Match many patterns in one pass by augmenting a trie with failure links. This is multi-pattern search done at scale.
Interactive visualization
Use the controls to connect the idea to concrete operations before diving into the full write-up.
Reading index 0
Patterns stored in one trie with failure links
Scan text once
Matches found so far
No completed pattern yet.
Many patterns, one pass
Aho-Corasick starts from a trie, then adds failure links so mismatches can fall back without restarting from scratch. That lets one scan of the text report every pattern occurrence.
Family
Strings
Prefix-aware structures and text-centric search strategies.
Builds on
1 topic
Read these first if you want the surrounding context.
Unlocks
0 next topics
Use these follow-ups when you want to apply the current idea in a richer setting.
Learning paths
1
This topic appears in curated progressions so you can keep moving without guessing the next step.
Problem
You have many patterns and one big text.
Examples:
- a blocklist of forbidden words
- many DNA motifs
- many keywords to detect in a document stream
Running KMP or a naive scan for every pattern separately is too expensive. You want one pass over the text to find all matches.
Core idea
Start with a trie containing every pattern.
Then add failure links:
- if the current edge does not exist
- jump to the longest proper suffix that is also a trie prefix
- continue from there instead of restarting from the root manually
This makes the trie behave like a finite automaton for many strings at once.
Why it is powerful
Each character of the text is processed once through a sequence of trie moves and occasional failure jumps. The fallback work is amortized, so the whole scan stays linear in the text length plus the number of matches.
That is the big win:
many patterns, one automaton, one scan
Construction outline
- insert every pattern into a trie
- perform a BFS over trie nodes
- compute a failure link for each node
- propagate output information so suffix matches are also reported
The BFS step is what turns the trie into the final matcher.
Complexity
If the total pattern length is and the text length is :
| Phase | Time |
|---|---|
| Build trie + failure links | or depending on implementation |
| Scan text |
When to use it
Use Aho-Corasick when:
- there are many patterns
- the text is large or streamed
- you need every occurrence, not just one pattern at a time
It is uncommon in everyday interview prep, but very important in real systems like filters, lexers, intrusion detection, and bioinformatics pipelines.
Key takeaways
- Aho-Corasick is a trie plus failure links
- It is the standard answer to multi-pattern matching
- BFS is part of the construction, which is a nice cross-family connection: string matching built with trie structure and graph-style traversal
- This is rare for beginner curricula, but it belongs in a serious string toolbox
Practice problems
| Problem | Difficulty | Key idea |
|---|---|---|
| Stream of Characters | 🔴 Hard | Trie-style streaming queries, close in spirit to Aho-Corasick |
| Aho-Corasick overview | 🔴 Hard | Build failure links and output transitions |
| Keyword Search references | 🔴 Hard | Multi-pattern text scanning |
Relation to other topics
- Trie is the base structure that stores all patterns together
- BFS often appears in the construction phase for failure links
- This is the natural “what comes next” topic after someone understands tries well
Build on these first
These topics supply the mental model or underlying structure that this page assumes.
Related directions
These topics live nearby conceptually, even if they are not direct prerequisites.
KMP (Knuth-Morris-Pratt)
Search for a pattern in linear time by reusing information about its own prefixes instead of restarting after mismatches.
Rabin-Karp
Search with rolling hashes so most windows can be rejected with arithmetic instead of full character comparisons.
Z-Algorithm
Compute how much of the global prefix matches at every position, then reuse the active Z-box to stay linear.
Suffix Automaton
Compress all substrings of a string into a linear-size automaton with suffix links.
More from Strings
Stay in the same family when you want parallel variations of the same mental model.
KMP (Knuth-Morris-Pratt)
Search for a pattern in linear time by reusing information about its own prefixes instead of restarting after mismatches.
Rabin-Karp
Search with rolling hashes so most windows can be rejected with arithmetic instead of full character comparisons.
Trie (Prefix Tree)
A tree-like data structure for efficient string storage and prefix-based retrieval. Essential for autocomplete, spell checking, and word search problems.
Z-Algorithm
Compute how much of the global prefix matches at every position, then reuse the active Z-box to stay linear.
Paths that include this topic
Follow one of these sequences if you want a guided next step instead of open-ended browsing.
String matching
Move from prefix reuse and rolling hashes to suffix-based indexing and automata.
From the blog
Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.