Hash Map
O(1) average-case lookups via hashing. Understand collision resolution, load factors, and when hash maps are the right (or wrong) choice.
Interactive visualization
Use the controls to connect the idea to concrete operations before diving into the full write-up.
Family
Lookup & hashing
Fast membership tests, key/value indexing, and amortized constant-time access.
Builds on
Foundational
You can start here without another page first.
Unlocks
6 next topics
Use these follow-ups when you want to apply the current idea in a richer setting.
Learning paths
4
This topic appears in curated progressions so you can keep moving without guessing the next step.
Problem
You have elements. You need to insert, lookup, and delete by key — not by position. A sorted array gives you search via binary search, but insertion. A linked list gives insertion but search. A hash map gives you average for all three. That’s the entire point.
Intuition
A hash map is an array where the index is computed from the key. A hash function maps an arbitrary key to an integer, and you use (where is the array size) as the bucket index. Two different keys can hash to the same index — that’s a collision, and you need a strategy to handle it.
Hash functions
A good hash function has three properties:
- Deterministic — same key always produces the same hash
- Uniform distribution — outputs spread evenly across the range, minimizing collisions
- Fast — the whole point is , a slow hash defeats the purpose
For strings, a common approach is polynomial rolling hash:
where is a prime (31 or 37 work well for lowercase ASCII). For integers, a common approach is .
Collision resolution
Separate chaining
Each bucket stores a linked list. On collision, append to the list.
put(key, value):
idx ← hash(key) mod m
for each (k, v) in table[idx]:
if k == key:
update v to value
return
append (key, value) to table[idx]
size++
get(key):
idx ← hash(key) mod m
for each (k, v) in table[idx]:
if k == key:
return v
return NOT_FOUND
Simple, works well when load factor stays reasonable. Worst case: every key hashes to the same bucket → linked list traversal.
Open addressing
No linked lists — everything lives in the array. On collision, probe for the next empty slot.
put(key, value):
idx ← hash(key) mod m
while table[idx] is occupied:
if table[idx].key == key:
update value
return
idx ← (idx + probe(i)) mod m // i = attempt number
table[idx] ← (key, value)
size++
Linear probing: . Simple but causes clustering — occupied slots clump together, degrading performance.
Quadratic probing: . Reduces primary clustering but can fail to find empty slots if the table is too full.
Double hashing: where is a second hash function. Best distribution, but more expensive per probe.
Load factor and resizing
The load factor (elements / buckets) determines performance. For separate chaining, average chain length is . For open addressing, expected probes approach as .
Most implementations resize (typically double ) when exceeds a threshold — 0.75 is the classic default (Java’s HashMap). Resizing means rehashing: allocate a new array of size , recompute for every existing entry, and reinsert. This is but happens rarely enough that insertion remains amortized .
Common patterns
Two-sum: store each number’s complement in a hash map as you iterate. time, space.
Frequency counting: count occurrences of each element. Foundation for group-anagrams (sorted string → key), majority element, top-k.
Two-pass pattern: first pass builds the map, second pass queries it. Avoids nested loops.
LRU cache: hash map (for lookup) + doubly linked list (for eviction order). The combination gives for both get and put.
When hash maps fail
Worst case is . If all keys collide, you degrade to a linked list. Java 8+ mitigates this by converting long chains to balanced trees ( worst case).
Hash flooding attacks: an adversary crafts keys that all hash to the same bucket, turning your server into . Mitigation: use randomized hash functions (SipHash) or keyed hashes.
No ordering: hash maps don’t maintain insertion order or sorted order (unless you use a LinkedHashMap or TreeMap variant). If you need range queries or sorted iteration, use a balanced BST instead.
Memory overhead: each entry carries key + value + pointer/metadata. For small keys, the overhead ratio is significant.
Complexity
| Operation | Average | Worst case |
|---|---|---|
| Insert | ||
| Lookup | ||
| Delete | ||
| Space | ||
| Resize |
Worst case occurs when all keys hash to the same bucket. With a good hash function and reasonable load factor, you stay firmly in territory.
Key takeaways
- average case depends on a good hash function and low load factor — degenerate inputs can push you to .
- “Can I trade space for time with a lookup table?” is the hash map question — if yes, hash map is almost always the answer.
- Frequency counting is the foundation of many problems: group anagrams, majority element, top-k frequent — all start with a frequency map.
- Hash map + linked list = LRU cache — combining data structures to get for both lookup and eviction order is a classic design pattern.
- Beware of hash flooding: in adversarial settings, use randomized or keyed hash functions (SipHash) to prevent worst-case collisions.
Practice problems
| Problem | Difficulty | Key idea |
|---|---|---|
| Two Sum | 🟢 Easy | Store complement in map as you iterate |
| Group Anagrams | 🟡 Medium | Sorted string or character count as hash key |
| Subarray Sum Equals K | 🟡 Medium | Prefix sum frequencies in a hash map |
| Longest Consecutive Sequence | 🟡 Medium | Hash set for neighbor checks, only start from sequence beginnings |
| LRU Cache | 🟡 Medium | Hash map + doubly linked list for get and put |
Relation to other topics
Hash set: a hash map where you only store keys (no values). Same mechanics, same complexity.
Bloom filter: a probabilistic hash set. Uses hash functions and a bit array. Can tell you “definitely not in set” or “probably in set” — no false negatives, but false positives. Much more space-efficient than a hash set when approximate membership is acceptable.
Consistent hashing: distributes keys across nodes in a distributed system. When a node joins or leaves, only of keys need remapping (vs. rehashing everything). Used in distributed caches (Memcached, DynamoDB).
What this enables
Once the current idea feels natural, these are the most useful next jumps.
Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)
Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.
Consistent Hashing
Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.
Hash Join & Merge Join
Compare the two most important database join strategies: build-and-probe hashing versus ordered merge scanning.
Bloom Filter & Cuckoo Filter
Use tiny approximate membership structures to reject absent keys cheaply before touching slower storage or larger indexes.
Count-Min Sketch
Estimate item frequencies in a stream with a tiny counter matrix, a few hash functions, and a guaranteed one-sided error.
HyperLogLog
Estimate the number of distinct items in a huge stream with tiny registers and the statistics of rare hash patterns.
Related directions
These topics live nearby conceptually, even if they are not direct prerequisites.
Linked List Operations
Reverse, detect cycles, merge sorted lists. Linked lists test your pointer manipulation skills — every operation is about rewiring next pointers.
Trie (Prefix Tree)
A tree-like data structure for efficient string storage and prefix-based retrieval. Essential for autocomplete, spell checking, and word search problems.
Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)
Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.
Consistent Hashing
Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.
Paths that include this topic
Follow one of these sequences if you want a guided next step instead of open-ended browsing.
Linear toolkit
Learn how sequence-shaped data unlocks queues, stacks, windows, and fast pointer manipulations.
Probabilistic systems
Learn how approximate membership, frequency, and cardinality structures trade tiny errors for massive scale.
Cache & distribution
Connect local eviction policy with distributed key placement and system-scale cache design.
Database operators
See how external sorting and join strategies turn in-memory algorithms into database execution plans.
From the blog
Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.