System design
Learn production systems by mechanism and trade-off, not by memorizing one canned answer per interview prompt.
Deep dives
10
Production-focused write-ups that stay concrete about APIs, storage, and failure handling.
Families
3
Follow topics by system concern instead of treating every interview design as an isolated whiteboard prompt.
End-to-end designs
4
These pages optimize for production architecture, not just naming the right buzzwords in an interview.
Learning paths
3
Curated sequences keep the next deep technical topic obvious as the atlas expands.
Search by mechanism, focus by family, and keep the current slice in the URL so it is easy to reopen the same learning thread later.
Browse by family
Topic type
Difficulty
Curated paths
Tags
10 matching topics .
Understand token-based rate limiting mathematically: saturated integrators, debt-space duals, and why token bucket and GCRA are the same policy in different coordinates.
Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Read now
Self-contained enough to open without another page first.
Learning paths
2 curated paths currently include this deep dive.
Unlocks
Designing a Rate Limiter (at Scale, Production-Grade)
Design a limiter that is actually deployable: low-latency enforcement, burst handling, distributed quotas, multi-region coordination, and failure-safe behavior.
Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Read now
Self-contained enough to open without another page first.
Learning paths
3 curated paths currently include this deep dive.
Unlocks
Global Quotas (Hierarchical Budgets Across Regions and Fleets), Load Shedding (Protecting Latency Under Saturation), Feedback Control for Autoscaling and Load Shedding
Design worldwide quotas without putting a globally serialized dependency in the request path, using hierarchical allocation, leased budgets, and bounded overshoot.
Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Builds on
Designing a Rate Limiter (at Scale, Production-Grade), Distributed Locking (Leases, Fencing Tokens, and When Not to Use It)
Learning paths
3 curated paths currently include this deep dive.
Design admission control that drops the right work at the right time, using concurrency, queue depth, cost, and priority instead of letting the service fail slowly.
Traffic management lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Builds on
Circuit Breakers (State Machines, Hysteresis, and Fast Failure)
Learning paths
2 curated paths currently include this deep dive.
Design a feature flag platform that supports low-latency local evaluation, strong auditability, deterministic targeting, and safe configuration rollouts across a fleet.
Control plane lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Read now
Self-contained enough to open without another page first.
Learning paths
1 curated path currently include this deep dive.
Unlocks
Global Quotas (Hierarchical Budgets Across Regions and Fleets)
Design distributed locking with explicit guarantees, stale-owner protection, and realistic failure semantics instead of assuming a lock magically creates correctness.
Control plane lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Read now
Self-contained enough to open without another page first.
Learning paths
1 curated path currently include this deep dive.
Unlocks
Global Quotas (Hierarchical Budgets Across Regions and Fleets)
Design circuit breakers that actually stabilize a fleet: rolling windows, half-open probes, dependency-scoped state, and clean interaction with retries and load shedding.
Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Builds on
Idempotency and Retries (Without Multiplying Load)
Learning paths
1 curated path currently include this deep dive.
Unlocks
Load Shedding (Protecting Latency Under Saturation), Feedback Control for Autoscaling and Load Shedding
Use PI/PID ideas the way production systems actually do: filtered signals, clamped actions, weak predictive bias, and layered controllers instead of textbook loops.
Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Builds on
Designing a Rate Limiter (at Scale, Production-Grade)
Learning paths
2 curated paths currently include this deep dive.
Unlocks
Load Shedding (Protecting Latency Under Saturation), Anti-Windup, Hysteresis, and Oscillation in Distributed Control Loops
Build a retry stack that survives crashes, duplicate delivery, and partial completion without turning transient failure into write amplification and data corruption.
Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Read now
Self-contained enough to open without another page first.
Learning paths
1 curated path currently include this deep dive.
Unlocks
Circuit Breakers (State Machines, Hysteresis, and Fast Failure)
Stabilize real control loops under delay and saturation: clamp integrators, separate thresholds, detect oscillation cheaply, and adapt gains before the system starts flapping.
Reliability lives inside Distributed systems , so broader distributed-systems trade-offs still matter here.
Builds on
Feedback Control for Autoscaling and Load Shedding
Learning paths
2 curated paths currently include this deep dive.