System Design 101 โ Building Blocks of Scalable Systems
System design interviews can feel overwhelming, but every complex system is built from the same fundamental components. Once you understand these building blocks, you can reason about any architecture.

This post covers the core components you'll see in every system design โ with practical explanations of why each exists and when to use it.
The Big Picture
Before diving into individual components, here's how they fit together in a typical web-scale system:

Every request flows through this chain. Let me break down each component.
1. Load Balancer โ "Distribute the Traffic"
A load balancer sits in front of your servers and distributes incoming requests across multiple instances. It's the reason Netflix doesn't crash when 100 million people hit play at 8 PM.
How It Works
Client Request
โ
โโโโโโโโโโโโโโโโโโโ
โ Load Balancer โ
โโโโโโโโโโโโโโโโโโโ
โ โ โ
Server1 Server2 Server3
Common Algorithms
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Requests rotate through servers in order | Equal-capacity servers |
| Least Connections | Sends to the server with fewest active connections | Varying request complexity |
| IP Hash | Same client IP always goes to same server | Session persistence |
| Weighted | More traffic to beefier servers | Mixed hardware |
Key Insight
Load balancers also handle health checks โ they stop sending traffic to unhealthy servers automatically. This is how you achieve high availability without manual intervention.
# Simplified health check logic
class LoadBalancer:
def __init__(self, servers: list[Server]):
self.servers = servers
def get_healthy_server(self) -> Server:
healthy = [s for s in self.servers if s.health_check()]
return min(healthy, key=lambda s: s.active_connections)
2. Caching โ "Don't Compute the Same Thing Twice"
Caching stores frequently accessed data in fast storage (usually memory) to avoid hitting slower databases or APIs repeatedly.
Cache Tiers
Request โ L1 (In-Memory, ~1ms) โ L2 (Redis, ~5ms) โ Database (~50ms)
Cache Strategies
Cache-Aside (Lazy Loading) โ The most common pattern:
async def get_user(user_id: str) -> User:
# 1. Check cache first
cached = await redis.get(f"user:{user_id}")
if cached:
return User.from_json(cached)
# 2. Cache miss โ fetch from database
user = await db.query("SELECT * FROM users WHERE id = ?", user_id)
# 3. Store in cache for next time (expire in 5 minutes)
await redis.set(f"user:{user_id}", user.to_json(), ex=300)
return user
Write-Through โ Write to cache and database simultaneously:
async def update_user(user_id: str, data: dict):
# Write to both โ cache is always fresh
await db.update("users", user_id, data)
await redis.set(f"user:{user_id}", json.dumps(data), ex=300)
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." โ Phil Karlton
Common invalidation strategies:
| Strategy | Mechanism | Trade-off |
|---|---|---|
| TTL (Time-to-Live) | Expires after N seconds | Simple but stale reads possible |
| Event-based | Invalidate on write events | Fresh data but more complex |
| Version-based | Key includes version number | Precise but coordination needed |
3. Database โ "Where the Truth Lives"
SQL vs NoSQL โ The Real Decision Framework
Don't pick your database based on hype. Pick it based on your access patterns:
| Choose SQL When | Choose NoSQL When |
|---|---|
| Data has relationships (joins) | Data is denormalized/nested |
| You need ACID transactions | You need horizontal scaling |
| Schema is well-defined | Schema evolves rapidly |
| Complex queries needed | Simple key-value lookups |
Database Scaling Patterns
Read Replicas โ Scale reads by copying data to replica databases:
Writes โ Primary DB
Reads โ Replica 1, Replica 2, Replica 3
This is the first scaling move most systems make. It works because most applications are read-heavy (~90% reads, ~10% writes).
Sharding โ Split data across multiple databases by a key:
Users A-M โ Shard 1
Users N-Z โ Shard 2
โ ๏ธ Warning: Sharding adds massive complexity. Don't shard until you absolutely have to. A single well-optimized Postgres instance can handle millions of rows.
4. Message Queue โ "Do It Later"
Message queues decouple producers from consumers, letting you process work asynchronously.
Why Queues Matter
Without a queue, if your email service is slow, your entire API slows down:
# โ Synchronous โ user waits for email to send
POST /signup โ Create User โ Send Email โ Return 200
โ
Slow! (2-5 seconds)
With a queue, the API returns instantly:
# โ
Asynchronous โ user gets instant response
POST /signup โ Create User โ Push to Queue โ Return 200
โ
Worker picks up โ Sends Email
Queue Pattern in Practice
# Producer โ API server
async def handle_signup(request):
user = await create_user(request.data)
# Don't send email now โ push to queue
await queue.publish("emails", {
"type": "welcome",
"to": user.email,
"name": user.name,
})
return {"status": "created"} # Returns in ~50ms
# Consumer โ Background worker
async def email_worker():
async for message in queue.subscribe("emails"):
await send_email(
to=message["to"],
template=message["type"],
data=message,
)
When to Use Queues
- Sending emails/notifications
- Processing images/videos
- Syncing data between services
- Any work that doesn't need an immediate response
5. CDN โ "Serve Content Close to Users"
A Content Delivery Network caches your static content (images, CSS, JS) on servers worldwide, so users download from a nearby server instead of your origin.
Impact
Without CDN: User in Tokyo โ Server in Virginia โ 200ms latency
With CDN: User in Tokyo โ CDN edge in Tokyo โ 20ms latency
That's a 10x improvement just by putting a CDN in front of your static assets.
What to Put on a CDN
- โ Images, videos, fonts
- โ CSS and JavaScript bundles
- โ Static HTML pages
- โ API responses that rarely change
- โ User-specific dynamic content
- โ Real-time data
Putting It All Together
Here's how these components work for a real system โ let's say a social media feed:
- CDN serves profile images and static assets
- Load Balancer routes API requests across server instances
- API Servers handle business logic
- Cache (Redis) stores pre-computed feeds and session data
- Database stores users, posts, and relationships
- Message Queue handles async work โ push notifications, feed updates, email digests
Scaling Checklist
When your system needs to scale, follow this order:
- Add caching (biggest bang for buck)
- Add read replicas (scale reads)
- Add a CDN (offload static content)
- Add message queues (decouple and go async)
- Add more app servers + load balancer (horizontal scaling)
- Shard the database (last resort โ high complexity)
Key Takeaways
| Component | One-Line Summary |
|---|---|
| Load Balancer | Distributes traffic, enables horizontal scaling |
| Cache | Stores hot data in memory, reduces database load |
| Database | Persistent storage โ pick SQL or NoSQL based on access patterns |
| Message Queue | Decouples services, enables async processing |
| CDN | Serves static content from edge locations near users |
The beauty of these building blocks is that they compose. Start simple โ a single server with a database โ and add components as your scale demands them. Every engineering decision is a trade-off, and the best architecture is the simplest one that meets your requirements.
Next up: I'll dive deeper into database scaling patterns and when to actually shard. Follow me on LinkedIn to stay updated.