Skip to main content
Back to the DSA atlas
Systems & storageProbabilistic systems Data structure Advanced

Bloom Filter & Cuckoo Filter

Use tiny approximate membership structures to reject absent keys cheaply before touching slower storage or larger indexes.

Probabilistic systems is nested under Systems & storage , so techniques in the broader family usually still apply here.

filtermembershipprobabilisticrare

Interactive visualization

Use the controls to connect the idea to concrete operations before diving into the full write-up.

Bit array after inserting a key

bit 0

0

bit 1

1

bit 2

0

bit 3

1

bit 4

0

bit 5

1

bit 6

0

bit 7

0

Approximate membership, tiny memory

Bloom filters set a few bits per key. A zero bit proves absence; all ones only suggest possible presence.

Family

Systems & storage → Probabilistic systems

Approximate membership, cardinality, and frequency structures for high-volume systems.

Builds on

1 topic

Read these first if you want the surrounding context.

Unlocks

0 next topics

Use these follow-ups when you want to apply the current idea in a richer setting.

Learning paths

1

This topic appears in curated progressions so you can keep moving without guessing the next step.

Choose this over that

Choose the membership filter that matches the contract

Both structures answer "definitely absent or maybe present," but they optimize different operational needs.

Filter variant

Bloom filter

Choose this when: The simplest approximate membership structure is enough and you do not need deletions.

Choose something else when: Dynamic deletes or bucket-based fingerprint placement matter.

Filter variant

Cuckoo filter

Choose this when: You want approximate membership plus deletion support and are willing to manage relocation logic.

Choose something else when: The simplest bit-array design is preferable or insert failures must never trigger rebuild logic.

Probabilistic systems Open topic

Count-Min Sketch

Choose this when: The question is about per-key frequency, not set membership.

Choose something else when: The system only needs to reject negative lookups quickly.

Probabilistic systems Open topic

HyperLogLog

Choose this when: The goal is counting distinct keys rather than testing whether one key may exist.

Choose something else when: You still need per-key membership checks.

Problem

Sometimes you do not need an exact set. You need a tiny front-line guard that can answer:

is this key definitely absent, or do I need to check the expensive structure behind it?

That is enough to skip disk reads, remote calls, SSTable probes, or cache misses that would otherwise dominate the cost.

Intuition

Approximate membership structures trade exactness for space.

They are designed so a negative answer is very informative:

  • Bloom filter: “definitely not present” or “maybe present”
  • Cuckoo filter: same style of answer, but with better support for deletions

The key systems value is not the final answer. It is the expensive work you avoid after a negative check.

Bloom filter

A Bloom filter stores:

  • a bit array of length m
  • k hash functions

Insert

insert(x):
    for i in 1 .. k:
        bit[h_i(x) mod m] <- 1

Query

contains(x):
    for i in 1 .. k:
        if bit[h_i(x) mod m] == 0:
            return DEFINITELY_ABSENT
    return MAYBE_PRESENT

What errors are possible?

  • false negatives: impossible
  • false positives: possible

A query can look present because unrelated keys happened to set the same bits.

Choosing k

If you expect n inserts into m bits, a standard choice is:

kmnln2 k \approx \frac{m}{n} \ln 2

And the false-positive rate is approximately:

(1ekn/m)k\left(1 - e^{-kn/m}\right)^k

So more bits per key reduce false positives, but the structure remains approximate.

Cuckoo filter

A Cuckoo filter stores short fingerprints inside small buckets instead of setting many global bits.

For a key x:

  • compute fingerprint f
  • compute first bucket i1 = hash(x)
  • compute second bucket i2 = i1 XOR hash(f)

The fingerprint can live in either bucket.

Query

contains(x):
    f <- fingerprint(x)
    i1 <- hash(x) mod numBuckets
    i2 <- i1 XOR hash(f)
    if f is in bucket[i1] or bucket[i2]:
        return MAYBE_PRESENT
    return DEFINITELY_ABSENT

Insert

insert(x):
    f <- fingerprint(x)
    i1 <- hash(x) mod numBuckets
    i2 <- i1 XOR hash(f)

    if bucket[i1] or bucket[i2] has space:
        place f there
        return SUCCESS

    i <- random choice of i1 or i2
    for kick in 1 .. MAX_KICKS:
        swap f with a random fingerprint in bucket[i]
        i <- i XOR hash(f)
        if bucket[i] has space:
            place f there
            return SUCCESS

    return REBUILD_NEEDED

Delete

delete(x):
    f <- fingerprint(x)
    remove f from bucket[i1] or bucket[i2] if present

That delete support is one of the biggest practical differences from plain Bloom filters.

Worked example

Imagine an LSM tree with hundreds of SSTables.

Before touching an SSTable for key invoice:42, the engine asks a membership filter whether the SSTable could possibly contain the key.

  • if the filter says definitely absent, that SSTable is skipped immediately
  • if the filter says maybe present, the engine performs the slower lookup

Even a modest false-positive rate is acceptable because the filter’s job is to eliminate most misses cheaply.

Complexity

StructureInsertQueryDeleteSpace style
Bloom filterO(k)O(k)not supported in plain Bloom filterBit array
Cuckoo filterexpected O(1)expected O(1)expected O(1)Buckets of short fingerprints

Cuckoo insertion can occasionally fail after many displacements, which triggers a rebuild or resize.

Bloom vs. Cuckoo: when to choose which

QuestionBloom filterCuckoo filter
Simplest implementation?YesNo
Deletions needed?No, unless using a counting Bloom variantYes
Excellent negative checks?YesYes
Operational modelSet bits with k hashesPlace fingerprints in two candidate buckets
Common useSSTable filters, network filters, one-way membership testsDynamic membership filters where deletion matters

Key takeaways

  • Bloom and Cuckoo filters solve approximate membership, not exact lookup.
  • The main payoff is avoiding slower downstream work, not replacing the real storage layer.
  • Bloom filters are simpler and have one-sided error: false positives but no false negatives.
  • Cuckoo filters use fingerprints and relocations so they can support deletion while staying compact.
  • These filters belong naturally next to LSM trees, caches, and distributed systems because those systems constantly benefit from cheap negative checks.

Practice problems

ProblemDifficultyKey idea
Bloom filter overviewHardBit-array membership approximation
Cuckoo filter paperHardFingerprints plus relocations
RocksDB filter notesHardWhy storage engines care so much about negative checks

Relation to other topics

  • LSM Tree often uses Bloom-style filters to skip SSTables cheaply on misses.
  • Count-Min Sketch and HyperLogLog are sibling sketches, but they estimate frequency and cardinality rather than membership.
  • Hash Map explains why hashing can stand in for full key storage when approximation is acceptable.

Build on these first

These topics supply the mental model or underlying structure that this page assumes.

Related directions

These topics live nearby conceptually, even if they are not direct prerequisites.

More from Probabilistic systems

Stay in the same family when you want parallel variations of the same mental model.

Paths that include this topic

Follow one of these sequences if you want a guided next step instead of open-ended browsing.

Probabilistic systems

Learn how approximate membership, frequency, and cardinality structures trade tiny errors for massive scale.

From the blog

Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.