Bloom Filter & Cuckoo Filter
Use tiny approximate membership structures to reject absent keys cheaply before touching slower storage or larger indexes.
Probabilistic systems is nested under Systems & storage , so techniques in the broader family usually still apply here.
Interactive visualization
Use the controls to connect the idea to concrete operations before diving into the full write-up.
Bit array after inserting a key
bit 0
0
bit 1
1
bit 2
0
bit 3
1
bit 4
0
bit 5
1
bit 6
0
bit 7
0
Approximate membership, tiny memory
Bloom filters set a few bits per key. A zero bit proves absence; all ones only suggest possible presence.
Family
Systems & storage → Probabilistic systems
Approximate membership, cardinality, and frequency structures for high-volume systems.
Builds on
1 topic
Read these first if you want the surrounding context.
Unlocks
0 next topics
Use these follow-ups when you want to apply the current idea in a richer setting.
Learning paths
1
This topic appears in curated progressions so you can keep moving without guessing the next step.
Choose this over that
Choose the membership filter that matches the contract
Both structures answer "definitely absent or maybe present," but they optimize different operational needs.
Bloom filter
Choose this when: The simplest approximate membership structure is enough and you do not need deletions.
Choose something else when: Dynamic deletes or bucket-based fingerprint placement matter.
Cuckoo filter
Choose this when: You want approximate membership plus deletion support and are willing to manage relocation logic.
Choose something else when: The simplest bit-array design is preferable or insert failures must never trigger rebuild logic.
Count-Min Sketch
Choose this when: The question is about per-key frequency, not set membership.
Choose something else when: The system only needs to reject negative lookups quickly.
HyperLogLog
Choose this when: The goal is counting distinct keys rather than testing whether one key may exist.
Choose something else when: You still need per-key membership checks.
Problem
Sometimes you do not need an exact set. You need a tiny front-line guard that can answer:
is this key definitely absent, or do I need to check the expensive structure behind it?
That is enough to skip disk reads, remote calls, SSTable probes, or cache misses that would otherwise dominate the cost.
Intuition
Approximate membership structures trade exactness for space.
They are designed so a negative answer is very informative:
- Bloom filter: “definitely not present” or “maybe present”
- Cuckoo filter: same style of answer, but with better support for deletions
The key systems value is not the final answer. It is the expensive work you avoid after a negative check.
Bloom filter
A Bloom filter stores:
- a bit array of length
m khash functions
Insert
insert(x):
for i in 1 .. k:
bit[h_i(x) mod m] <- 1
Query
contains(x):
for i in 1 .. k:
if bit[h_i(x) mod m] == 0:
return DEFINITELY_ABSENT
return MAYBE_PRESENT
What errors are possible?
- false negatives: impossible
- false positives: possible
A query can look present because unrelated keys happened to set the same bits.
Choosing k
If you expect n inserts into m bits, a standard choice is:
And the false-positive rate is approximately:
So more bits per key reduce false positives, but the structure remains approximate.
Cuckoo filter
A Cuckoo filter stores short fingerprints inside small buckets instead of setting many global bits.
For a key x:
- compute fingerprint
f - compute first bucket
i1 = hash(x) - compute second bucket
i2 = i1 XOR hash(f)
The fingerprint can live in either bucket.
Query
contains(x):
f <- fingerprint(x)
i1 <- hash(x) mod numBuckets
i2 <- i1 XOR hash(f)
if f is in bucket[i1] or bucket[i2]:
return MAYBE_PRESENT
return DEFINITELY_ABSENT
Insert
insert(x):
f <- fingerprint(x)
i1 <- hash(x) mod numBuckets
i2 <- i1 XOR hash(f)
if bucket[i1] or bucket[i2] has space:
place f there
return SUCCESS
i <- random choice of i1 or i2
for kick in 1 .. MAX_KICKS:
swap f with a random fingerprint in bucket[i]
i <- i XOR hash(f)
if bucket[i] has space:
place f there
return SUCCESS
return REBUILD_NEEDED
Delete
delete(x):
f <- fingerprint(x)
remove f from bucket[i1] or bucket[i2] if present
That delete support is one of the biggest practical differences from plain Bloom filters.
Worked example
Imagine an LSM tree with hundreds of SSTables.
Before touching an SSTable for key invoice:42, the engine asks a membership filter whether the SSTable could possibly contain the key.
- if the filter says definitely absent, that SSTable is skipped immediately
- if the filter says maybe present, the engine performs the slower lookup
Even a modest false-positive rate is acceptable because the filter’s job is to eliminate most misses cheaply.
Complexity
| Structure | Insert | Query | Delete | Space style |
|---|---|---|---|---|
| Bloom filter | O(k) | O(k) | not supported in plain Bloom filter | Bit array |
| Cuckoo filter | expected O(1) | expected O(1) | expected O(1) | Buckets of short fingerprints |
Cuckoo insertion can occasionally fail after many displacements, which triggers a rebuild or resize.
Bloom vs. Cuckoo: when to choose which
| Question | Bloom filter | Cuckoo filter |
|---|---|---|
| Simplest implementation? | Yes | No |
| Deletions needed? | No, unless using a counting Bloom variant | Yes |
| Excellent negative checks? | Yes | Yes |
| Operational model | Set bits with k hashes | Place fingerprints in two candidate buckets |
| Common use | SSTable filters, network filters, one-way membership tests | Dynamic membership filters where deletion matters |
Key takeaways
- Bloom and Cuckoo filters solve approximate membership, not exact lookup.
- The main payoff is avoiding slower downstream work, not replacing the real storage layer.
- Bloom filters are simpler and have one-sided error: false positives but no false negatives.
- Cuckoo filters use fingerprints and relocations so they can support deletion while staying compact.
- These filters belong naturally next to LSM trees, caches, and distributed systems because those systems constantly benefit from cheap negative checks.
Practice problems
| Problem | Difficulty | Key idea |
|---|---|---|
| Bloom filter overview | Hard | Bit-array membership approximation |
| Cuckoo filter paper | Hard | Fingerprints plus relocations |
| RocksDB filter notes | Hard | Why storage engines care so much about negative checks |
Relation to other topics
- LSM Tree often uses Bloom-style filters to skip SSTables cheaply on misses.
- Count-Min Sketch and HyperLogLog are sibling sketches, but they estimate frequency and cardinality rather than membership.
- Hash Map explains why hashing can stand in for full key storage when approximation is acceptable.
Build on these first
These topics supply the mental model or underlying structure that this page assumes.
Related directions
These topics live nearby conceptually, even if they are not direct prerequisites.
Consistent Hashing
Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.
LSM Tree
Trade read complexity for write throughput by buffering writes in memory, flushing sorted runs, and compacting them in the background.
Count-Min Sketch
Estimate item frequencies in a stream with a tiny counter matrix, a few hash functions, and a guaranteed one-sided error.
HyperLogLog
Estimate the number of distinct items in a huge stream with tiny registers and the statistics of rare hash patterns.
More from Probabilistic systems
Stay in the same family when you want parallel variations of the same mental model.
Count-Min Sketch
Estimate item frequencies in a stream with a tiny counter matrix, a few hash functions, and a guaranteed one-sided error.
HyperLogLog
Estimate the number of distinct items in a huge stream with tiny registers and the statistics of rare hash patterns.
Paths that include this topic
Follow one of these sequences if you want a guided next step instead of open-ended browsing.
Probabilistic systems
Learn how approximate membership, frequency, and cardinality structures trade tiny errors for massive scale.
From the blog
Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.