Coding Agents & IDEsDocumented
system-design-architect
Distributed systems design architect. CAP theorem, consistency models, caching patterns, sharding, message queues, circuit breakers, load balancing, and the numbers every engineer should know.
Share:
Installation
npx clawhub@latest install system-design-architectView the full skill documentation and source below.
Documentation
System Design Architect
The Design Framework (Use for Every System)
1. Requirements (5-10 min)
- Functional: What does the system DO?
- Non-functional: Scale, latency, consistency, availability
2. Estimates (2-5 min)
- QPS (queries per second) — read vs write ratio
- Storage: data size * growth rate * retention
- Bandwidth: QPS * payload size
3. High-Level Design (10-15 min)
- API design: endpoints, request/response
- Core components: draw the boxes
- Data flow: how does a request travel?
4. Deep Dive (20-30 min)
- Bottlenecks and how to solve them
- Database choice and schema
- Caching strategy
- Failure modes and resilience
5. Trade-offs
- What you chose and why
- What you sacrificed
- What you'd change with more time
Numbers Every Engineer Should Know
L1 cache hit: 1 ns
L2 cache hit: 4 ns
L3 cache hit: 10 ns
RAM read: 100 ns
NVMe SSD read: 120 µs
Network (same datacenter): 500 µs
Network (cross-region): 50-150 ms
Disk seek (HDD): 10 ms
Order of magnitude:
1 server handles: ~100,000 req/s (simple)
1 database handles: ~10,000 QPS (Postgres, with indexes)
Redis: ~1,000,000 ops/s
1 Kafka partition: ~100 MB/s throughput
Storage rough estimates:
1 char = 1 byte
1 tweet-sized object = 256 bytes
1 photo = 300 KB
1 minute HD video = 50 MB
1 TB = 10^12 bytes
Scale estimates:
Twitter: 500M tweets/day = ~6K tweets/s
YouTube: 500 hours of video uploaded per minute
Instagram: 100M photos/day = ~1,000 photos/s
CAP Theorem & Consistency Models
CAP: A distributed system can guarantee only 2 of 3:
C (Consistency) — All nodes see the same data simultaneously
A (Availability) — Every request gets a (non-error) response
P (Partition tolerance) — System works when network partitions occur
Since network partitions ALWAYS happen, choose CP or AP:
CP: Consistent + Partition-tolerant (accept downtime over stale data)
Examples: HBase, ZooKeeper, Etcd, MongoDB
AP: Available + Partition-tolerant (accept stale data over downtime)
Examples: Cassandra, DynamoDB, CouchDB, DNS
Consistency Levels (Ordered Strong → Weak)
| Level | Description | Use When |
| Linearizable | Like a single machine; reads always see latest write | Financial transactions |
| Sequential | All nodes see same order, not necessarily real-time | Collaborative editing |
| Causal | Causally related ops in order | Social feeds |
| Eventual | Nodes converge eventually | DNS, social likes |
| Read-your-writes | You always see your own writes | Profile updates |
| Monotonic reads | Reads never go backwards in time | Social timeline |
Load Balancing
Algorithms:
Round Robin → simple, works if servers are identical
Weighted RR → different server capacities
Least Connections → server with fewest active connections
IP Hash → same client → same server (session stickiness)
Consistent Hashing → distribute with minimal redistribution on changes
Layer 4 (Transport): Route by TCP/IP. Fast, no app visibility.
Use for: TCP traffic, very high throughput
Layer 7 (Application): Route by HTTP headers, URL, cookies.
Use for: HTTP services, A/B testing, canary deployments
Health checks:
Active: LB sends probe requests every 5-30s
Passive: Monitor actual traffic for errors
Session stickiness options:
1. Stateless apps (best) — any server can handle any request
2. Server-side sessions + shared cache (Redis)
3. JWT tokens — client carries state, stateless servers
Caching Strategy
Cache Tiers
Browser → CDN → API Cache → Application Cache → Database
CDN (CloudFront, Fastly, Cloudflare):
- Static assets: CSS, JS, images, videos
- Cache-Control: max-age=31536000 (1 year) for versioned assets
- Purge on deploy
Application Cache (Redis/Memcached):
- Session data
- API response cache
- Computed results, leaderboards, counters
Database Query Cache:
- Materialized views
- Query result caching
Cache Patterns
Cache-Aside (Lazy Loading):
1. Read from cache
2. Miss: read from DB, write to cache, return
3. TTL-based invalidation
Good for: read-heavy, can tolerate stale data
Problem: cache miss on first request, thundering herd
Write-Through:
1. Write to cache AND DB synchronously
Good for: consistency, no stale reads
Problem: write latency, cache full of data never read
Write-Behind (Async):
1. Write to cache immediately (return success)
2. Async: flush to DB
Good for: write performance
Problem: data loss risk if cache fails
Read-Through:
Cache sits in front of DB, handles misses
Good for: simple app code (cache is transparent)
Refresh-Ahead:
Proactively refresh before expiry
Good for: predictable access patterns, critical data
# Cache-aside pattern
import redis
import json
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def get_user(user_id: str) -> dict:
cache_key = f"user:{user_id}"
# Try cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss — fetch from DB
user = db.query_user(user_id) # Slow
if user:
r.setex(cache_key, 3600, json.dumps(user)) # TTL: 1 hour
return user
# Cache invalidation on write
def update_user(user_id: str, data: dict):
db.update_user(user_id, data)
r.delete(f"user:{user_id}") # Invalidate cache
# Or: r.set(f"user:{user_id}", json.dumps(updated_user), ex=3600)
Cache Busting Problems
Thundering herd: Cache expires, 10K requests hit DB simultaneously
Solution:
- Cache lock: Only 1 request rebuilds cache, others wait
- Jitter: Randomize TTL ±10% to stagger expiry
- Cache warming: Proactively rebuild before expiry
Cache penetration: Request for non-existent data bypasses cache, hits DB every time
Solution: Cache negative results (NULL) with short TTL (60s)
Cache stampede: Cache server restart / invalidation causes all caches to miss at once
Solution: Circuit breaker, circuit-level locking, cache warming script
Database Design Patterns
Sharding Strategies
Horizontal Sharding (partitioning rows across databases):
1. Range-based sharding
- User IDs 1-1M → Shard 1, 1M-2M → Shard 2
- Problem: Hot spots if range is popular
2. Hash-based sharding
- shard = hash(user_id) % num_shards
- Problem: Resharding requires migrating data
3. Consistent hashing
- Virtual nodes on a ring
- Adding a shard only moves 1/N data
- Used by: DynamoDB, Cassandra, Redis Cluster
4. Directory-based sharding
- Lookup table: user_id → shard_id
- Flexible, but lookup table is a bottleneck
Vertical sharding: Split by table/feature (users in DB1, orders in DB2)
Good for: feature isolation, different scaling needs
Problem: Cross-shard joins are expensive (do in app layer)
Read Replicas
Primary (read + write) → Replica 1 (read only)
→ Replica 2 (read only)
→ Replica 3 (read only + analytics)
Route reads to replicas:
- Most reads go to replicas
- Writes always go to primary
- Inconsistency window: replication lag (usually <1s)
Use read-your-writes routing:
- After a write, route that user's reads to primary for a short time
- Or: read from primary for operations that must see latest data
Message Queues and Event-Driven Architecture
Point-to-point queue (RabbitMQ, SQS):
Producer → [Queue] → Consumer (one consumer per message)
Use for: task distribution, work queues, one-time processing
Pub/Sub topic (Kafka, SNS, Google Pub/Sub):
Producer → [Topic] → Consumer Group 1 (all messages)
→ Consumer Group 2 (all messages)
Use for: event streaming, audit logs, multiple consumers
Kafka key concepts:
Topic: named stream of events
Partition: parallel lane within a topic (ordered within partition)
Consumer group: set of consumers sharing partition processing
Offset: position within a partition (Kafka stores until retention)
Ordering guarantee:
- Within a partition: always ordered
- Across partitions: no ordering (use single partition for strict ordering)
- To keep related events together: partition by entity ID (user_id, order_id)
Event-Driven Patterns
Event Sourcing:
Store ALL events, not just current state
State = replay of all events from beginning
+ Complete audit trail
+ Easy to rebuild state at any point in time
+ Event-driven by design
- Replay can be slow (snapshots solve this)
- Schema evolution is hard
CQRS (Command Query Responsibility Segregation):
Commands → Write Model (normalized, consistent)
Queries → Read Model (denormalized, fast, eventual)
+ Read model optimized for queries
+ Write model optimized for consistency
- Eventual consistency between models
- More complexity
Saga Pattern (distributed transactions):
Choreography: Each service emits events, others react
Orchestration: Central coordinator tells services what to do
Order Service → emit OrderCreated
← Inventory Service reserves items
← Payment Service charges card
← Shipping Service creates shipment
If any step fails: compensating transactions to undo
Circuit Breaker Pattern
from enum import Enum
import time
from threading import Lock
class State(Enum):
CLOSED = "closed" # Normal operation, requests flow
OPEN = "open" # Failing, reject requests fast
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold=5, success_threshold=2, timeout=60):
self.failure_threshold = failure_threshold
self.success_threshold = success_threshold
self.timeout = timeout
self.state = State.CLOSED
self.failure_count = 0
self.success_count = 0
self.last_failure_time = None
self._lock = Lock()
def call(self, func, *args, **kwargs):
with self._lock:
if self.state == State.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = State.HALF_OPEN
else:
raise CircuitOpenError("Circuit is open")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
with self._lock:
if self.state == State.HALF_OPEN:
self.success_count += 1
if self.success_count >= self.success_threshold:
self.state = State.CLOSED
self.failure_count = 0
def _on_failure(self):
with self._lock:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = State.OPEN
Design Patterns for Common Systems
URL Shortener (bit.ly)
Key decisions:
1. ID generation: Base62 encode auto-increment OR random 7-char string
2. Storage: Redis for hot links, SQL for persistence
3. Redirect: 301 (permanent, cached by browser) vs 302 (temporary, track every click)
4. Scale: 100M URLs, 10B redirects/day = ~115K redirects/s
Schema:
short_code TEXT PRIMARY KEY
long_url TEXT
created_at TIMESTAMP
expires_at TIMESTAMP
click_count INTEGER -- eventually consistent counter
Bottleneck: 115K reads/s → Redis cache with LRU eviction
Cache key: short_code → long_url
Cache hit ratio goal: 95%+ (Zipf distribution: 20% of links get 80% traffic)
Rate Limiter
Token bucket: Most common
- Bucket starts with N tokens
- Refill at rate R tokens/second
- Request consumes 1 token; reject if empty
Implementation: Redis + Lua script (atomic)
Fixed window: Simple
- Count requests per minute window
- Problem: burst at window boundary (2x limit in 2s)
Sliding window log:
- Store timestamp of each request
- Count requests in last N seconds
- Problem: memory-intensive (1 entry per request)
Sliding window counter:
- Interpolate between two fixed windows
- Balance of accuracy and memory
Redis Lua script (atomic token bucket):
local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
if tokens > 0 then
redis.call('SET', KEYS[1], tokens - 1, 'EX', ARGV[2])
return 1 -- allowed
end
return 0 -- rejected
Reliability Patterns
SLA/SLO/SLI:
SLI: Measurement (e.g., 99.5% of requests in <200ms)
SLO: Target (e.g., achieve above SLI monthly)
SLA: Contract with consequences if SLO is missed
Error budget:
99.9% availability = 8.7 hours downtime/year
99.99% = 52 minutes/year
Use error budget to decide when to ship new features vs focus on reliability
Bulkhead pattern: Isolate failures
- Separate thread pools per downstream dependency
- Database timeouts can't exhaust HTTP thread pool
Retry with exponential backoff:
wait = min(cap, base * 2^attempt) + jitter
jitter prevents synchronized retries from causing thundering herd
Health checks:
Shallow: "Am I up?" → responds 200 (load balancer check)
Deep: "Are my dependencies up?" → checks DB, cache, downstream (startup check only)