Skip to main content
Back to the DSA atlas
Systems & storage Data structure Advanced

B-Tree

Store many keys per node so ordered search stays shallow and expensive page reads stay low.

b-treedatabaseindexsearch

Interactive visualization

Use the controls to connect the idea to concrete operations before diving into the full write-up.

root [10 | 20]

left child [3 | 7 | 9]

middle child [12 | 15 | 18]

right child [22 | 25 | 30]

Visited nodes

root [10 | 20]
middle child [12 | 15 | 18]

Wide nodes beat tall trees on disk

Because each node holds many keys, the tree stays shallow and disk reads stay low.

Family

Systems & storage

Database indexes, cache policies, query operators, sharding, and external-memory algorithms.

Builds on

2 topics

Read these first if you want the surrounding context.

Unlocks

1 next topic

Use these follow-ups when you want to apply the current idea in a richer setting.

Learning paths

2

This topic appears in curated progressions so you can keep moving without guessing the next step.

Choose this over that

Choose the storage index that matches the workload

B-Trees minimize page depth, but practical engines often push farther toward range scans or write optimization.

Binary Search Tree (BST)

Choose this when: The structure is in memory and page-level storage costs are not the bottleneck.

Choose something else when: Disk or page reads dominate the cost model.

Current topic

B-Tree

Choose this when: You want the core page-aware ordered-search structure and a clean bridge from classic trees to storage indexes.

Choose something else when: Range scans or write-heavy buffering dominate the design.

Systems & storage Open topic

B+ Tree

Choose this when: Point lookups and especially ordered scans are the main indexed workload.

Choose something else when: You are learning the general page-aware tree idea first.

Systems & storage Open topic

LSM Tree

Choose this when: Write throughput matters more than maintaining one eagerly updated on-disk tree.

Choose something else when: Predictable indexed reads and scans are the main priority.

Problem

A binary search tree is conceptually clean, but it is a bad storage-engine default when every node visit may cost a page read.

If one lookup walks through many tiny nodes, the tree spends more time waiting for storage than comparing keys.

Intuition

A B-Tree reduces height by storing many keys per node.

Each node corresponds to a page-sized block of sorted separator keys. Instead of branching by 2, the tree branches by a large factor B, so the height becomes roughly:

O(logBn)O(\log_B n)

That is the whole design goal: trade more work inside one node for fewer page hops between nodes.

Structural rules

Using the standard minimum-degree notation t:

  • every node stores between t - 1 and 2t - 1 keys, except the root
  • an internal node with k keys has k + 1 children
  • every non-root internal node has between t and 2t children
  • keys inside a node are sorted
  • all leaves appear at the same depth

Those occupancy rules keep the tree balanced while preserving high fanout.

Search inside the current node, then descend into the child interval that contains the target key.

search(node, key):
    find smallest i such that key <= node.keys[i]

    if i exists and node.keys[i] == key:
        return FOUND
    if node is leaf:
        return NOT_FOUND
    return search(node.child[i], key)

In practice, the search inside one node is often a binary search over that node’s sorted key array.

Insertion

The clean insertion strategy is: never descend into a full child.

Split a full child

split_child(parent, i):
    y <- parent.child[i]
    z <- new node

    move y.keys[t .. 2t-2] into z
    median <- y.keys[t-1]
    keep y.keys[0 .. t-2]

    if y is not leaf:
        move y.children[t .. 2t-1] into z

    insert z as parent's child after y
    insert median into parent.keys at position i

Insert into a non-full node

insert_non_full(node, key):
    if node is leaf:
        insert key into node.keys in sorted order
        return

    choose child i that should receive key
    if child i is full:
        split_child(node, i)
        if key > node.keys[i]:
            i <- i + 1
    insert_non_full(node.child[i], key)

Full insertion

insert(tree, key):
    if root is full:
        newRoot <- new internal node
        newRoot.child[0] <- old root
        split_child(newRoot, 0)
        root <- newRoot
    insert_non_full(root, key)

The split pushes a separator upward and keeps the tree balanced automatically.

Deletion

Deletion is harder because nodes are not allowed to become too empty.

High-level cases:

  1. Delete from a leaf if the node still has enough keys afterward.
  2. Delete from an internal node by replacing the key with its predecessor or successor if a child can spare one.
  3. Borrow from a sibling if a target child is too small.
  4. Merge siblings if neither sibling can spare a key, then continue recursively.

The invariant is the mirror image of insertion: when descending, try not to enter a child that is already at the minimum occupancy.

Worked example

Suppose one page can hold about 200 keys.

Then one root page can branch to roughly 201 children. Two more levels below it cover about:

20138,120,601201^3 \approx 8{,}120{,}601

leaf ranges.

That is why B-Trees are such a natural storage-engine structure: a tiny height can cover a huge ordered dataset.

Complexity

OperationTime
SearchO(log_B n) page levels
InsertO(log_B n)
DeleteO(log_B n)
SpaceO(n)

Within each page there is also a small search over the node’s key array, but the dominant cost in storage systems is usually page depth.

B-Tree vs. BST vs. B+ Tree

StructureFanoutBest for
Binary search tree2In-memory ordered search
B-TreeLargePage-efficient ordered search
B+ TreeLarge, leaf-linkedDatabase indexes and range scans

B-Trees are the conceptual bridge from classic search trees to real storage-engine indexes.

Key takeaways

  • B-Trees exist to minimize page depth, not just comparison count.
  • The defining operations are split on overflow and borrow/merge on underflow.
  • High fanout is what makes the structure shallow enough for storage systems.
  • Search, insert, and delete all stay logarithmic in the number of page levels.
  • B+ Trees take the same page-aware idea and optimize it further for range scans and practical indexing.

Practice problems

ProblemDifficultyKey idea
B-Tree overviewHardPage-sized balanced search tree
PostgreSQL B-tree docsHardOrdered indexes in a real engine
CLRS B-tree chapter notesHardSplit, insert, and delete invariants

Relation to other topics

  • Binary Search explains why sorted separators are enough to route a search.
  • Binary Search Tree is the narrow-fanout in-memory ancestor of the same ordered-search idea.
  • B+ Tree is the database-optimized variant most engines prefer for indexed scans.
  • External Merge Sort sits nearby because both topics are shaped by page and disk costs.

Build on these first

These topics supply the mental model or underlying structure that this page assumes.

What this enables

Once the current idea feels natural, these are the most useful next jumps.

Related directions

These topics live nearby conceptually, even if they are not direct prerequisites.

More from Systems & storage

Stay in the same family when you want parallel variations of the same mental model.

Paths that include this topic

Follow one of these sequences if you want a guided next step instead of open-ended browsing.

Ordered structures

See how order can come from sorted arrays, search trees, skip lists, disk-friendly indexes, heaps, and prefix trees.

Storage engines

Compare fanout-heavy indexes, write-optimized trees, and external-memory sorting in one storage-oriented path.

From the blog

Pair the atlas with recent writing from the rest of the site when you want a broader engineering perspective alongside the topic graph.