Lookup & hashing

Data structure

Intro

Hash Map

O(1) average-case lookups via hashing. Understand collision resolution, load factors, and when hash maps are the right (or wrong) choice.

hash-map

hash-table

array

Interactive visualization

Use the controls to connect the idea to concrete operations before diving into the full write-up.

Speed α = 0.00 (0/8)

Empty Hashing Target bucket Found / Inserted Collision / Deleted

Family

Lookup & hashing

Fast membership tests, key/value indexing, and amortized constant-time access.

Builds on

Foundational

You can start here without another page first.

Unlocks

6 next topics

Use these follow-ups when you want to apply the current idea in a richer setting.

Learning paths

This topic appears in curated progressions so you can keep moving without guessing the next step.

Problem

You have $n$ elements. You need to insert, lookup, and delete by key — not by position. A sorted array gives you $O(\log n)$ search via binary search, but $O(n)$ insertion. A linked list gives $O(1)$ insertion but $O(n)$ search. A hash map gives you $O(1)$ average for all three. That’s the entire point.

Intuition

A hash map is an array where the index is computed from the key. A hash function $h(k)$ maps an arbitrary key $k$ to an integer, and you use $h(k) \bmod m$ (where $m$ is the array size) as the bucket index. Two different keys can hash to the same index — that’s a collision, and you need a strategy to handle it.

Hash functions

A good hash function has three properties:

Deterministic — same key always produces the same hash
Uniform distribution — outputs spread evenly across the range, minimizing collisions
Fast — the whole point is $O(1)$ , a slow hash defeats the purpose

For strings, a common approach is polynomial rolling hash:

$h(s) = \sum_{i=0}^{n-1} s[i] \cdot p^i \mod m$

where $p$ is a prime (31 or 37 work well for lowercase ASCII). For integers, a common approach is $h(k) = k \cdot \text{large\_prime} \mod m$ .

Collision resolution

Separate chaining

Each bucket stores a linked list. On collision, append to the list.

put(key, value):
    idx ← hash(key) mod m
    for each (k, v) in table[idx]:
        if k == key:
            update v to value
            return
    append (key, value) to table[idx]
    size++

get(key):
    idx ← hash(key) mod m
    for each (k, v) in table[idx]:
        if k == key:
            return v
    return NOT_FOUND

Simple, works well when load factor stays reasonable. Worst case: every key hashes to the same bucket → $O(n)$ linked list traversal.

Open addressing

No linked lists — everything lives in the array. On collision, probe for the next empty slot.

put(key, value):
    idx ← hash(key) mod m
    while table[idx] is occupied:
        if table[idx].key == key:
            update value
            return
        idx ← (idx + probe(i)) mod m    // i = attempt number
    table[idx] ← (key, value)
    size++

Linear probing: $\text{probe}(i) = i$ . Simple but causes clustering — occupied slots clump together, degrading performance.

Quadratic probing: $\text{probe}(i) = i^2$ . Reduces primary clustering but can fail to find empty slots if the table is too full.

Double hashing: $\text{probe}(i) = i \cdot h_2(k)$ where $h_2$ is a second hash function. Best distribution, but more expensive per probe.

Load factor and resizing

The load factor $\alpha = \frac{n}{m}$ (elements / buckets) determines performance. For separate chaining, average chain length is $\alpha$ . For open addressing, expected probes approach $\frac{1}{1 - \alpha}$ as $\alpha \to 1$ .

Most implementations resize (typically double $m$ ) when $\alpha$ exceeds a threshold — 0.75 is the classic default (Java’s HashMap). Resizing means rehashing: allocate a new array of size $2m$ , recompute $h(k) \bmod 2m$ for every existing entry, and reinsert. This is $O(n)$ but happens rarely enough that insertion remains amortized $O(1)$ .

Common patterns

Two-sum: store each number’s complement in a hash map as you iterate. $O(n)$ time, $O(n)$ space.

Frequency counting: count occurrences of each element. Foundation for group-anagrams (sorted string → key), majority element, top-k.

Two-pass pattern: first pass builds the map, second pass queries it. Avoids nested loops.

LRU cache: hash map (for $O(1)$ lookup) + doubly linked list (for $O(1)$ eviction order). The combination gives $O(1)$ for both get and put.

When hash maps fail

Worst case is $O(n)$ . If all keys collide, you degrade to a linked list. Java 8+ mitigates this by converting long chains to balanced trees ( $O(\log n)$ worst case).

Hash flooding attacks: an adversary crafts keys that all hash to the same bucket, turning your $O(1)$ server into $O(n^2)$ . Mitigation: use randomized hash functions (SipHash) or keyed hashes.

No ordering: hash maps don’t maintain insertion order or sorted order (unless you use a LinkedHashMap or TreeMap variant). If you need range queries or sorted iteration, use a balanced BST instead.

Memory overhead: each entry carries key + value + pointer/metadata. For small keys, the overhead ratio is significant.

Complexity

Operation	Average	Worst case
Insert	$O(1)$	$O(n)$
Lookup	$O(1)$	$O(n)$
Delete	$O(1)$	$O(n)$
Space	$O(n)$	$O(n)$
Resize	$O(n)$	$O(n)$

Worst case occurs when all keys hash to the same bucket. With a good hash function and reasonable load factor, you stay firmly in $O(1)$ territory.

Key takeaways

$O(1)$ average case depends on a good hash function and low load factor — degenerate inputs can push you to $O(n)$ .
“Can I trade space for time with a lookup table?” is the hash map question — if yes, hash map is almost always the answer.
Frequency counting is the foundation of many problems: group anagrams, majority element, top-k frequent — all start with a frequency map.
Hash map + linked list = LRU cache — combining data structures to get $O(1)$ for both lookup and eviction order is a classic design pattern.
Beware of hash flooding: in adversarial settings, use randomized or keyed hash functions (SipHash) to prevent worst-case $O(n)$ collisions.

Practice problems

Problem	Difficulty	Key idea
Two Sum	🟢 Easy	Store complement in map as you iterate
Group Anagrams	🟡 Medium	Sorted string or character count as hash key
Subarray Sum Equals K	🟡 Medium	Prefix sum frequencies in a hash map
Longest Consecutive Sequence	🟡 Medium	Hash set for $O(1)$ neighbor checks, only start from sequence beginnings
LRU Cache	🟡 Medium	Hash map + doubly linked list for $O(1)$ get and put

Relation to other topics

Hash set: a hash map where you only store keys (no values). Same mechanics, same complexity.

Bloom filter: a probabilistic hash set. Uses $k$ hash functions and a bit array. Can tell you “definitely not in set” or “probably in set” — no false negatives, but false positives. Much more space-efficient than a hash set when approximate membership is acceptable.

Consistent hashing: distributes keys across nodes in a distributed system. When a node joins or leaves, only $\frac{1}{n}$ of keys need remapping (vs. rehashing everything). Used in distributed caches (Memcached, DynamoDB).

What this enables

Once the current idea feels natural, these are the most useful next jumps.

Systems & storage

Technique

Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)

Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.

Systems & storage

Technique

Consistent Hashing

Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.

Systems & storage

Algorithm

Hash Join & Merge Join

Compare the two most important database join strategies: build-and-probe hashing versus ordered merge scanning.

Probabilistic systems

Data structure

Bloom Filter & Cuckoo Filter

Use tiny approximate membership structures to reject absent keys cheaply before touching slower storage or larger indexes.

Probabilistic systems

Data structure

Count-Min Sketch

Estimate item frequencies in a stream with a tiny counter matrix, a few hash functions, and a guaranteed one-sided error.

Probabilistic systems

Data structure

HyperLogLog

Estimate the number of distinct items in a huge stream with tiny registers and the statistics of rare hash patterns.

Related directions

These topics live nearby conceptually, even if they are not direct prerequisites.

Linear structures

Data structure

Linked List Operations

Reverse, detect cycles, merge sorted lists. Linked lists test your pointer manipulation skills — every operation is about rewiring next pointers.

Strings

Data structure

Trie (Prefix Tree)

A tree-like data structure for efficient string storage and prefix-based retrieval. Essential for autocomplete, spell checking, and word search problems.

Systems & storage

Technique

Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU)

Treat cache eviction as an algorithm choice: recency, frequency, adaptation, and admission all optimize different workloads.

Systems & storage

Technique

Consistent Hashing

Map keys onto a ring so cluster membership changes move only nearby keys instead of reshuffling the whole keyspace.

Paths that include this topic

Follow one of these sequences if you want a guided next step instead of open-ended browsing.

Linear toolkit

Learn how sequence-shaped data unlocks queues, stacks, windows, and fast pointer manipulations.

Deque (Double-Ended Queue) Linked List Operations Queue Stack & Monotonic Stack Two Pointers & Sliding Window Monotonic Queue Hash Map

Probabilistic systems

Learn how approximate membership, frequency, and cardinality structures trade tiny errors for massive scale.

Hash Map Bloom Filter & Cuckoo Filter Count-Min Sketch HyperLogLog

Cache & distribution

Connect local eviction policy with distributed key placement and system-scale cache design.

Linked List Operations Queue Hash Map Cache Eviction Strategies (LRU, LFU, ARC, TinyLFU) Consistent Hashing

Database operators

See how external sorting and join strategies turn in-memory algorithms into database execution plans.

Heap & Priority Queue Merge Sort Hash Map External Merge Sort Hash Join & Merge Join

From the blog

Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.

Don't eat too much red meat.

Too much iron ?

January 24, 2026

3 min read

myth science

Linux is just that much better

Yes, even on Apple hardware.

December 9, 2025

2 min read

linux macos

Git rebase is always the wrong choice.

Please, stop making everyone suffer.

December 6, 2025

4 min read

git