Skip to main content
Back to the DSA atlas
Lookup & hashing Data structure Intro

Hash Map

O(1) average-case lookups via hashing. Understand collision resolution, load factors, and when hash maps are the right (or wrong) choice.

hash-maphash-tablearray

Interactive visualization

Use the controls to connect the idea to concrete operations before diving into the full write-up.

α = 0.00 (0/8)
[0][1][2][3][4][5][6][7]emptyemptyemptyemptyemptyemptyemptyempty
Empty Hashing Target bucket Found / Inserted Collision / Deleted

Family

Lookup & hashing

Fast membership tests, key/value indexing, and amortized constant-time access.

Builds on

Foundational

You can start here without another page first.

Unlocks

6 next topics

Use these follow-ups when you want to apply the current idea in a richer setting.

Learning paths

4

This topic appears in curated progressions so you can keep moving without guessing the next step.

Problem

You have nn elements. You need to insert, lookup, and delete by key — not by position. A sorted array gives you O(logn)O(\log n) search via binary search, but O(n)O(n) insertion. A linked list gives O(1)O(1) insertion but O(n)O(n) search. A hash map gives you O(1)O(1) average for all three. That’s the entire point.

Intuition

A hash map is an array where the index is computed from the key. A hash function h(k)h(k) maps an arbitrary key kk to an integer, and you use h(k)modmh(k) \bmod m (where mm is the array size) as the bucket index. Two different keys can hash to the same index — that’s a collision, and you need a strategy to handle it.

Hash functions

A good hash function has three properties:

  1. Deterministic — same key always produces the same hash
  2. Uniform distribution — outputs spread evenly across the range, minimizing collisions
  3. Fast — the whole point is O(1)O(1), a slow hash defeats the purpose

For strings, a common approach is polynomial rolling hash:

h(s)=i=0n1s[i]pimodmh(s) = \sum_{i=0}^{n-1} s[i] \cdot p^i \mod m

where pp is a prime (31 or 37 work well for lowercase ASCII). For integers, a common approach is h(k)=klarge_primemodmh(k) = k \cdot \text{large\_prime} \mod m.

Collision resolution

Separate chaining

Each bucket stores a linked list. On collision, append to the list.

put(key, value):
    idx ← hash(key) mod m
    for each (k, v) in table[idx]:
        if k == key:
            update v to value
            return
    append (key, value) to table[idx]
    size++

get(key):
    idx ← hash(key) mod m
    for each (k, v) in table[idx]:
        if k == key:
            return v
    return NOT_FOUND

Simple, works well when load factor stays reasonable. Worst case: every key hashes to the same bucket → O(n)O(n) linked list traversal.

Open addressing

No linked lists — everything lives in the array. On collision, probe for the next empty slot.

put(key, value):
    idx ← hash(key) mod m
    while table[idx] is occupied:
        if table[idx].key == key:
            update value
            return
        idx ← (idx + probe(i)) mod m    // i = attempt number
    table[idx] ← (key, value)
    size++

Linear probing: probe(i)=i\text{probe}(i) = i. Simple but causes clustering — occupied slots clump together, degrading performance.

Quadratic probing: probe(i)=i2\text{probe}(i) = i^2. Reduces primary clustering but can fail to find empty slots if the table is too full.

Double hashing: probe(i)=ih2(k)\text{probe}(i) = i \cdot h_2(k) where h2h_2 is a second hash function. Best distribution, but more expensive per probe.

Load factor and resizing

The load factor α=nm\alpha = \frac{n}{m} (elements / buckets) determines performance. For separate chaining, average chain length is α\alpha. For open addressing, expected probes approach 11α\frac{1}{1 - \alpha} as α1\alpha \to 1.

Most implementations resize (typically double mm) when α\alpha exceeds a threshold — 0.75 is the classic default (Java’s HashMap). Resizing means rehashing: allocate a new array of size 2m2m, recompute h(k)mod2mh(k) \bmod 2m for every existing entry, and reinsert. This is O(n)O(n) but happens rarely enough that insertion remains amortized O(1)O(1).

Common patterns

Two-sum: store each number’s complement in a hash map as you iterate. O(n)O(n) time, O(n)O(n) space.

Frequency counting: count occurrences of each element. Foundation for group-anagrams (sorted string → key), majority element, top-k.

Two-pass pattern: first pass builds the map, second pass queries it. Avoids nested loops.

LRU cache: hash map (for O(1)O(1) lookup) + doubly linked list (for O(1)O(1) eviction order). The combination gives O(1)O(1) for both get and put.

When hash maps fail

Worst case is O(n)O(n). If all keys collide, you degrade to a linked list. Java 8+ mitigates this by converting long chains to balanced trees (O(logn)O(\log n) worst case).

Hash flooding attacks: an adversary crafts keys that all hash to the same bucket, turning your O(1)O(1) server into O(n2)O(n^2). Mitigation: use randomized hash functions (SipHash) or keyed hashes.

No ordering: hash maps don’t maintain insertion order or sorted order (unless you use a LinkedHashMap or TreeMap variant). If you need range queries or sorted iteration, use a balanced BST instead.

Memory overhead: each entry carries key + value + pointer/metadata. For small keys, the overhead ratio is significant.

Complexity

OperationAverageWorst case
InsertO(1)O(1)O(n)O(n)
LookupO(1)O(1)O(n)O(n)
DeleteO(1)O(1)O(n)O(n)
SpaceO(n)O(n)O(n)O(n)
ResizeO(n)O(n)O(n)O(n)

Worst case occurs when all keys hash to the same bucket. With a good hash function and reasonable load factor, you stay firmly in O(1)O(1) territory.

Key takeaways

  • O(1)O(1) average case depends on a good hash function and low load factor — degenerate inputs can push you to O(n)O(n).
  • “Can I trade space for time with a lookup table?” is the hash map question — if yes, hash map is almost always the answer.
  • Frequency counting is the foundation of many problems: group anagrams, majority element, top-k frequent — all start with a frequency map.
  • Hash map + linked list = LRU cache — combining data structures to get O(1)O(1) for both lookup and eviction order is a classic design pattern.
  • Beware of hash flooding: in adversarial settings, use randomized or keyed hash functions (SipHash) to prevent worst-case O(n)O(n) collisions.

Practice problems

ProblemDifficultyKey idea
Two Sum🟢 EasyStore complement in map as you iterate
Group Anagrams🟡 MediumSorted string or character count as hash key
Subarray Sum Equals K🟡 MediumPrefix sum frequencies in a hash map
Longest Consecutive Sequence🟡 MediumHash set for O(1)O(1) neighbor checks, only start from sequence beginnings
LRU Cache🟡 MediumHash map + doubly linked list for O(1)O(1) get and put

Relation to other topics

Hash set: a hash map where you only store keys (no values). Same mechanics, same complexity.

Bloom filter: a probabilistic hash set. Uses kk hash functions and a bit array. Can tell you “definitely not in set” or “probably in set” — no false negatives, but false positives. Much more space-efficient than a hash set when approximate membership is acceptable.

Consistent hashing: distributes keys across nodes in a distributed system. When a node joins or leaves, only 1n\frac{1}{n} of keys need remapping (vs. rehashing everything). Used in distributed caches (Memcached, DynamoDB).

What this enables

Once the current idea feels natural, these are the most useful next jumps.

Related directions

These topics live nearby conceptually, even if they are not direct prerequisites.

Paths that include this topic

Follow one of these sequences if you want a guided next step instead of open-ended browsing.

Linear toolkit

Learn how sequence-shaped data unlocks queues, stacks, windows, and fast pointer manipulations.

Probabilistic systems

Learn how approximate membership, frequency, and cardinality structures trade tiny errors for massive scale.

Cache & distribution

Connect local eviction policy with distributed key placement and system-scale cache design.

Database operators

See how external sorting and join strategies turn in-memory algorithms into database execution plans.

From the blog

Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.