Skip to main content

System design

System Design Atlas

Learn production systems by mechanism and trade-off, not by memorizing one canned answer per interview prompt.

Deep dives

10

Production-focused write-ups that stay concrete about APIs, storage, and failure handling.

Families

3

Follow topics by system concern instead of treating every interview design as an isolated whiteboard prompt.

End-to-end designs

4

These pages optimize for production architecture, not just naming the right buzzwords in an interview.

Learning paths

3

Curated sequences keep the next deep technical topic obvious as the atlas expands.

Navigate the system design atlas

Search by mechanism, focus by family, and keep the current slice in the URL so it is easy to reopen the same learning thread later.

Browse by family

Topic type

Difficulty

Curated paths

Tags

System design topics

10 matching topics .

Traffic management Foundation Advanced

Token Bucket, GCRA, and Virtual Time

Understand token-based rate limiting mathematically: saturated integrators, debt-space duals, and why token bucket and GCRA are the same policy in different coordinates.

Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Read now

Self-contained enough to open without another page first.

Learning paths

2 curated paths currently include this deep dive.

Unlocks

Designing a Rate Limiter (at Scale, Production-Grade)

token-bucketgcravirtual-timerate-limitingintegratortraffic-shaping
Traffic management End-to-end design Advanced

Designing a Rate Limiter (at Scale, Production-Grade)

Design a limiter that is actually deployable: low-latency enforcement, burst handling, distributed quotas, multi-region coordination, and failure-safe behavior.

Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Read now

Self-contained enough to open without another page first.

Learning paths

3 curated paths currently include this deep dive.

Unlocks

Global Quotas (Hierarchical Budgets Across Regions and Fleets), Load Shedding (Protecting Latency Under Saturation), Feedback Control for Autoscaling and Load Shedding

rate-limitingtoken-bucketsliding-windowredismulti-regioncontrol-plane
Traffic management End-to-end design Advanced

Global Quotas (Hierarchical Budgets Across Regions and Fleets)

Design worldwide quotas without putting a globally serialized dependency in the request path, using hierarchical allocation, leased budgets, and bounded overshoot.

Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Builds on

Designing a Rate Limiter (at Scale, Production-Grade), Distributed Locking (Leases, Fencing Tokens, and When Not to Use It)

Learning paths

3 curated paths currently include this deep dive.

global-quotasbudget-allocationmulti-regionhierarchical-limitsleasesfairness
Traffic management End-to-end design Advanced

Load Shedding (Protecting Latency Under Saturation)

Design admission control that drops the right work at the right time, using concurrency, queue depth, cost, and priority instead of letting the service fail slowly.

Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Builds on

Circuit Breakers (State Machines, Hysteresis, and Fast Failure)

Learning paths

2 curated paths currently include this deep dive.

load-sheddingadmission-controlconcurrencybrownoutoverloadlatency
Control plane End-to-end design Advanced

Feature Flags Control Plane (Versioning, Distribution, and Safe Rollouts)

Design a feature flag platform that supports low-latency local evaluation, strong auditability, deterministic targeting, and safe configuration rollouts across a fleet.

Control plane lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Read now

Self-contained enough to open without another page first.

Learning paths

1 curated path currently include this deep dive.

Unlocks

Global Quotas (Hierarchical Budgets Across Regions and Fleets)

feature-flagscontrol-planerolloutsxdstargetingconfiguration
Control plane Trade-off Advanced

Distributed Locking (Leases, Fencing Tokens, and When Not to Use It)

Design distributed locking with explicit guarantees, stale-owner protection, and realistic failure semantics instead of assuming a lock magically creates correctness.

Control plane lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Read now

Self-contained enough to open without another page first.

Learning paths

1 curated path currently include this deep dive.

Unlocks

Global Quotas (Hierarchical Budgets Across Regions and Fleets)

distributed-lockingleasesfencingconsensusrediszookeeper
Reliability Building block Advanced

Circuit Breakers (State Machines, Hysteresis, and Fast Failure)

Design circuit breakers that actually stabilize a fleet: rolling windows, half-open probes, dependency-scoped state, and clean interaction with retries and load shedding.

Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Builds on

Idempotency and Retries (Without Multiplying Load)

Learning paths

1 curated path currently include this deep dive.

Unlocks

Load Shedding (Protecting Latency Under Saturation), Feedback Control for Autoscaling and Load Shedding

circuit-breakertimeoutshalf-openhysteresisresiliencedependency-isolation
Reliability Building block Advanced

Feedback Control for Autoscaling and Load Shedding

Use PI/PID ideas the way production systems actually do: filtered signals, clamped actions, weak predictive bias, and layered controllers instead of textbook loops.

Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Builds on

Designing a Rate Limiter (at Scale, Production-Grade)

Learning paths

2 curated paths currently include this deep dive.

Unlocks

Load Shedding (Protecting Latency Under Saturation), Anti-Windup, Hysteresis, and Oscillation in Distributed Control Loops

feedback-controlpidpiautoscalingload-sheddingewma
Reliability Building block Advanced

Idempotency and Retries (Without Multiplying Load)

Build a retry stack that survives crashes, duplicate delivery, and partial completion without turning transient failure into write amplification and data corruption.

Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Read now

Self-contained enough to open without another page first.

Learning paths

1 curated path currently include this deep dive.

Unlocks

Circuit Breakers (State Machines, Hysteresis, and Fast Failure)

idempotencyretriesexactly-oncebackoffoutboxdeduplication
Reliability Trade-off Advanced

Anti-Windup, Hysteresis, and Oscillation in Distributed Control Loops

Stabilize real control loops under delay and saturation: clamp integrators, separate thresholds, detect oscillation cheaply, and adapt gains before the system starts flapping.

Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.

Builds on

Feedback Control for Autoscaling and Load Shedding

Learning paths

2 curated paths currently include this deep dive.

anti-winduphysteresisoscillationcontrol-loopsautoscalingstability