Consistent Hashing
Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.
Interactive visualization
Use the controls to connect the idea to concrete operations before diving into the full write-up.
user:17
server A
post:88
server B
feed:9
server C
Stable sharding under cluster changes
Consistent hashing avoids full remapping when servers are added or removed. That is why it is so common in distributed caches and partitioned key-value systems.
Family
Systems & storage
Database indexes, cache policies, query operators, sharding, and external-memory algorithms.
Builds on
1 topic
Read these first if you want the surrounding context.
Unlocks
0 next topics
Use these follow-ups when you want to apply the current idea in a richer setting.
Learning paths
1
This topic appears in curated progressions so you can keep moving without guessing the next step.
Choose this over that
Separate distributed placement from local cache policy
Consistent hashing decides which machine owns the key. It is not a replacement for the local data structure or eviction policy.
Hash Map
Choose this when: The question is local key lookup inside one process or one machine.
Choose something else when: The mapping must remain stable while the cluster changes.
Consistent Hashing
Choose this when: Adding or removing machines should move only nearby keys instead of reshuffling the whole keyspace.
Choose something else when: The problem is local eviction or local membership, not distributed ownership.
Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)
Choose this when: The system has already chosen the owning machine and now needs to decide what survives in memory on that machine.
Choose something else when: The real problem is placement across the cluster.
Problem
If you place keys with hash(key) % n, changing n remaps almost everything.
That is fine for one process, but terrible for distributed caches and sharded storage. Adding one server should not invalidate the entire cluster’s placement map.
Intuition
Consistent hashing replaces the linear bucket range with a ring.
- hash every server onto the ring
- hash every key onto the same ring
- assign the key to the first server encountered while walking clockwise
Now a server owns only the interval immediately before its ring position. When a new server appears, it steals only that interval instead of forcing a global reshuffle.
Data structure model
A typical implementation stores the server positions in a sorted structure:
- sorted array + binary search
- balanced tree / ordered map
- skip list or B-tree-like ordered index
Real systems also use virtual nodes: each physical server is hashed to many ring positions.
That smooths imbalance because a server no longer owns one giant interval. It owns many smaller intervals scattered around the ring.
Core operations
Lookup
owner(key):
pos <- hash(key)
i <- lower_bound(ringPositions, pos)
if i exists:
return ringPositions[i].server
return ringPositions[0].server // wrap around the ring
The only algorithmic requirement is an ordered search for the first server clockwise from the key.
Add a server
add_server(server, replicas):
for r in 1 .. replicas:
pos <- hash(server.id || "#" || r)
insert (pos, server) into ordered ring
Each inserted virtual node steals only the interval between its predecessor and itself.
Remove a server
remove_server(server):
remove all virtual-node positions belonging to server
The keys in those intervals fall through to the next clockwise server.
Worked example
Assume the ring positions are:
Aat 20Bat 60Cat 85
Then the ownership intervals are:
(85, 20] -> A(20, 60] -> B(60, 85] -> C
So:
- key hash 10 ->
A - key hash 42 ->
B - key hash 74 ->
C - key hash 91 -> wrap ->
A
Now add D at 70.
Only the interval (60, 70] moves from C to D. Keys elsewhere stay put. That local movement is the entire point of the technique.
Why virtual nodes matter
Without virtual nodes, one unlucky server hash can own a huge fraction of the ring.
With R virtual nodes per machine:
- load balances more evenly
- adding one physical machine introduces many small ownership changes instead of one large cliff
- replicas and heterogeneous capacity become easier to express
A bigger machine can simply receive more virtual nodes.
Complexity
Assume there are V virtual nodes total.
| Operation | Time | Notes |
|---|---|---|
| Lookup | O(log V) | Binary search in a sorted ring |
Add one physical server with R virtual nodes | O(R log V) | Insert R ordered positions |
Remove one physical server with R virtual nodes | O(R log V) | Delete its ordered positions |
| Key remapping on topology change | proportional to affected intervals | Only nearby keys move |
The asymptotic lookup cost is not the main win. The real win is minimal remapping under membership change.
Consistent hashing vs. modulo hashing
| Scheme | Lookup rule | What happens when a server changes? |
|---|---|---|
hash(key) % n | Direct modulo bucket | Nearly every key can move |
| Consistent hashing | First clockwise server on a ring | Only keys in nearby intervals move |
Modulo hashing is simpler. Consistent hashing is the right choice when stability under scaling events matters more than trivial implementation.
Key takeaways
- Consistent hashing is about stable placement, not faster hashing.
- The algorithm needs an ordered ring of server positions and a clockwise owner rule.
- Virtual nodes are what make the technique robust in real systems.
- Only a small region of keys moves when a node joins or leaves, which is why caches and sharded stores rely on it.
- Cache eviction and consistent hashing solve different problems: one chooses the local victim, the other chooses the owning machine.
Practice problems
| Problem | Difficulty | Key idea |
|---|---|---|
| Consistent hashing overview | Hard | Ring-based stable remapping |
| Dynamo paper | Hard | Partitioning and virtual nodes in a production key-value store |
| Memcached client hashing notes | Hard | Why distributed caches care about remapping behavior |
Relation to other topics
- Hash Map gives the single-machine hashing intuition that consistent hashing generalizes to a cluster.
- Cache Eviction Strategies decide what stays in one node’s cache once consistent hashing has placed the key there.
- Bloom Filter & Cuckoo Filter are often paired with distributed placement to avoid useless downstream probes.
Build on these first
These topics supply the mental model or underlying structure that this page assumes.
Related directions
These topics live nearby conceptually, even if they are not direct prerequisites.
Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)
Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.
Bloom Filter & Cuckoo Filter
Use tiny approximate membership structures to reject absent keys cheaply before touching slower storage or larger indexes.
More from Systems & storage
Stay in the same family when you want parallel variations of the same mental model.
B-Tree
Store many keys per node so ordered search stays shallow and expensive page reads stay low.
B+ Tree
Keep routing keys in internal nodes and all records in linked leaves so point lookups stay shallow and range scans stay fast.
Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)
Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.
External Merge Sort
Sort data larger than RAM by generating sorted runs and merging them with mostly sequential disk I/O.
Paths that include this topic
Follow one of these sequences if you want a guided next step instead of open-ended browsing.
Cache & distribution
Connect local eviction policy with distributed key placement and system-scale cache design.
From the blog
Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.