Rohit Marathe

LangChain vs LangGraph: Architecting Stateful Multi-Agent Systems

2026-03-29T00:00:00Z

The landscape of building LLM-powered applications is evolving rapidly. We've moved from simple prompt engineering to complex "chains," and now we're entering the era of truly agentic, multi-agent systems.

When you're architecting these systems, the choice between LangChain and LangGraph is critical. While they share a lineage, their architectural foundations are built for fundamentally different use cases.

The Architectural Divide: Linear vs. Cyclic

At its core, the difference is one of topology: Linear Chains vs. Cyclic Graphs.

Feature	LangChain (Chains)	LangGraph (Graphs)
Topology	Directed Acyclic Graph (DAG)	Cyclic Graph
Flow	Linear / Sequential	Looping / Iterative
State	Implicit / Append-only	Explicit / Schema-based
Cycles	Not natively supported	First-class citizen
Persistence	Basic memory buffers	Checkpointing / Time Travel

LangChain: The Linear Evolution

LangChain is built around the concept of a DAG. You define a sequence of steps—a "chain"—where data flows from one component to the next in a predetermined path. It’s excellent for straightforward transformations and simple RAG (Retrieval-Augmented Generation).

LangGraph: The Agentic Revolution

LangGraph is designed for stateful, cyclic workflows. Instead of a fixed sequence, you define a graph where Nodes represent actions (LLM calls, tools) and Edges define transitions.

Control Flow & Decision Making

Transitions in LangGraph aren't just direct; you use Conditional Edges (routers) to make real-time decisions. For example, a router node can analyze an LLM's output and decide whether to call a tool or move to a final response node.

Advanced State Management

One of the most significant upgrades in LangGraph is how it treats state.

Explicit State Schemas

In LangGraph, you define a State object (using TypedDict or Pydantic). This provides a single source of truth that is passed between every node in the graph.

ReducedValue: Allows you to define custom state reducers (e.g., merging a list of messages instead of overwriting).
UntrackedValue: Useful for transient data that needs to be accessible within a node but shouldn't be persisted in the graph's history.

Reliability: Persistence & Human-in-the-Loop

Checkpoints and Time Travel

LangGraph includes built-in Persistence. Every step of the graph is checkpointed. If a system crashes, you can resume exactly where you left off. This also enables Time Travel, allowing you to "rewind" to a previous state, inspect what the agent was thinking, and even branch into a new execution path.

Interrupts

For sensitive actions (like executing a shell command or making a transaction), LangGraph supports Interrupts. You can pause the graph, wait for a human to approve the action, and then resume.

Multi-Agent Architecture Patterns

When scaling to multiple agents, LangGraph supports three primary patterns:

Supervisor: A central manager agent delegates tasks to specialized workers.
Collaborative: Agents share a common state and pass control back and forth directly.
Swarm: A decentralized collection of agents that coordinate through shared memory and triggers.

Practical Example: The Self-Correction Loop

A classic use case for LangGraph is a Self-Correction Loop:

Agent Node: Generates code.
Test Node: Executes the code and runs unit tests.
Conditional Edge: If tests pass, move to End. If they fail, route back to the Agent Node with the error log for a fix.

Security & Stability (March 2026 Update)

Stability is as important as architecture. As of March 2026, several critical security advisories have been released concerning AI framework vulnerabilities:

CVE-2026-34070: Path Traversal in legacy prompt loading.
CVE-2025-68664: Serialization injection in dumps() and load().
CVE-2025-67644: Unsafe deserialization of LLM-influenced metadata.

To mitigate these, ensure your systems are running langchain-core (1.2.22+) and langgraph-checkpoint-sqlite (3.0.1+). These versions deprecate unsafe legacy functions and introduce stricter validation for deserialized objects.

Choosing the Right Tool

Choose LangChain if your workflow is a clear, step-by-step process. It’s faster to prototype and easier for simple tasks.
Choose LangGraph if your application needs to handle complex decisions, requires cycles/looping, or needs persistent, reliable execution.

The future of AI is about the sophisticated architectures we build around them. Moving from chains to graphs is the first step toward building truly intelligent systems.

About the author: Rohit Marathe is an AI Systems Engineer specializing in multi-agent orchestration and large-scale LLM deployments.

Multi-Turn Conversation with LLM

2026-03-24T00:00:00Z

Multi-Turn Conversation with LLM

In this post, we will discuss various methods available for handling multi-turn conversations with Large Language Models (LLMs).

With the release of early systems like "Launch Pad," we achieved some level of multi-turn conversation. However, a key question remains: How should the system act and interact?

Currently, most LLM-based experiences act by collecting client information and acknowledging the user with specific responses. Guardrails often prevent the LLM from going off-topic, sticking rigidly to engineered prompts. However, there are limitations:

Time limits: Often limited to 2 minutes; increasing this can make the LLM go "off the rails."
Task/Goal support: Current implementations often lack clear task or goal handling.
Topic handling: There's often no explicit mechanism for topic switching or management.

Let's explore ways to tackle these limitations and achieve longer, more coherent conversations using Dialog Management.

LLM-Based Agent Framework

One promising approach is a specialized Large Language Model (LLM) tailored for Goal-Oriented Dialogue (GOD), which is particularly effective in complex diagnostic scenarios like medical or legal consultations. Unlike general chatbots, a specialized system can proactively guide users toward specific outcomes by managing dialogue topics and maintaining state.

The Framework

The framework consists of four main components:

Thinking Topic Development
Maintaining Topic Stack
Enriching Topic
Generating Response

1. Thinking Topic Development

When starting a conversation, the first step is picking a topic. The system analyzes the user's query and decides the main topic for the current round of dialogue.

How it works:

Receives user query.
Analyzes previous dialogue context to identify user interest.
Influences conversation direction for a relevant, engaging response.

2. Maintaining Topic Stack

Imagine juggling several conversation threads—talking about a project, planning a meetup, and discussing a movie. Maintaining a "stack" allows the system to track these threads and switch between them without losing context.

How it works:

Maintains a "stack" of topics discussed.
Adds new topics to the top; removes resolved or irrelevant ones.
Manages the flow, ensuring all points are addressed smoothly.

3. Enriching Topic

This adds details to make the conversation more engaging. For example, if discussing soccer, enrichment might include player stats or notable moments.

How it works:

Looks for ways to add relevant details to a chosen topic.
Pulls in background info, related topics, or deeper insights.
Keeps the user engaged with comprehensive context.

4. Generating Response

The final step is forming the reply based on the topic, enriched information, and context.

How it works:

Combines the current topic, enriched info, and history.
Uses rules and models to advance the conversation toward user goals.
Considers user preferences and tone for a natural feel.

Example: Collect Basic User Information

Task Definition

{
    "task_name": "Collect Basic User Information",
    "overview": "Collecting essential personal and contact information from a user to create or update their profile.",
    "goal": "Accurately collect and record all necessary user information",
    "checklist": [
        "Ask for user's full name",
        "Request date of birth for eligibility",
        "Inquire about email address",
        "Collect phone number for urgent queries",
        "Ask for home address",
        "Confirm data consent",
        "Summarize info for verification",
        "Provide confirmation number"
    ]
}

Sample Chat History

User: Collect Basic User Information

Virtual Assistant: Thank you for reaching out. May I have your full name, please?

User: My full name is Rohit Marathe

Virtual Assistant: Thank you, Rohit. May I also ask for your date of birth for eligibility verification?

User: Yeah sure, I was born on 25 Aug 1995

Virtual Assistant: Thank you. May I have your email address for communication?

User: rt@gmail.com

Virtual Assistant: And your phone number for urgent inquiries?

User: 407-666-7637

Virtual Assistant: Lastly, what is your residential address?

User: 520 E Church St, Orlando 32801

Agent Prompt Templates

Topic Manager Prompt

MANAGE_TOPIC_PROMPT = """Your objective is to manage dialogue topics between an AI expert and a user.
The topics should stay within the relevant field.
You have a topic list: {topic_list}.
Current topic: {current_topic}.
Tools: {tool_description}.
"""

Enricher Prompt

ENRICH_TOPIC_PROMPT = """Your objective is to enrich dialogue topics.
I will give an original topic and you will provide an enriched version (max 120 words).
Ensure the AI expert can understand it easily to improve the dialogue.
"""

Chat Prompt Template

CHAT_PROMPT_TEMPLATE = """You are a famous expert who guides users through their demands.
Focus on the current topic: {current_topic}.
- If topic starts with 'Ask user:', ask concretely.
- If topic starts with 'Answer user:', provide a theoretical answer, specific suggestions, and follow-up clarification questions.
- If 'Complete goal:', give a comprehensive final answer.
"""

Multi-turn conversation remains a challenging but exciting field in LLM development. By implementing structured topic management, we can move beyond simple Q&A to truly intelligent, goal-oriented assistants.

Hello World — Why I'm Starting This Blog

2025-02-27T00:00:00Z

If you're reading this, welcome! I'm Rohit — an AI Systems Engineer currently building intelligent systems at Rocket Mortgage in Atlanta.

I've spent the last 6+ years working at the intersection of software engineering and artificial intelligence, from building conversational AI bots at Kore.ai to architecting multi-agent systems that handle millions of requests in production.

Why a blog?

I've learned so much from engineers who share their knowledge openly — blog posts that saved me hours of debugging, architecture write-ups that shaped how I think about systems, and tutorials that helped me pick up new tools. It's time I give back.

This blog is where I'll write about:

Building AI systems in production — not just the theory, but the messy reality of deploying LLMs at scale
Multi-agent orchestration — patterns, pitfalls, and what actually works when you need multiple AI agents to collaborate
Cloud engineering — AWS, Kubernetes, and the infrastructure that makes AI systems reliable
Lessons from the field — things I wish I knew earlier, debugging war stories, and career reflections

What I won't do

I'm not going to write generic "What is AI?" explainers. There are plenty of those. Instead, I'll share the things I've learned by actually building and operating these systems — the decisions, trade-offs, and hard-won insights that come from production experience.

Let's connect

If any of this resonates with you, I'd love to connect. Find me on GitHub or LinkedIn, or just keep reading.

Let's build something cool. 🚀

Multi-Agent Orchestration — Patterns That Actually Work

2025-02-27T00:00:00Z

After spending the past year building multi-agent systems at Rocket Mortgage, I've learned that the gap between a demo and production is enormous. Here are the patterns that actually survived contact with real traffic.

The Problem

A single LLM prompt can answer simple questions. But when you need an AI system to handle mortgage calculations, search property databases, analyze banker performance, and diagnose system alerts — you need specialization.

That's where multi-agent orchestration comes in: instead of one monolithic prompt, you build a system of specialized agents that collaborate.

Pattern 1: Router-Based Orchestration

The simplest pattern that works. A central router agent classifies incoming requests and delegates to the appropriate specialist.

class OrchestratorRouter:
    def __init__(self, agents: dict[str, Agent]):
        self.agents = agents
        self.classifier = ClassifierAgent()

    async def handle(self, request: UserRequest) -> Response:
        intent = await self.classifier.classify(request)
        agent = self.agents[intent]
        return await agent.execute(request)

When to use it: When your agents have clearly distinct domains and requests rarely need multiple agents.

The catch: Classification accuracy is everything. A misrouted request gives the user a completely wrong answer with full confidence. We found that adding a confidence threshold and falling back to a general agent for low-confidence classifications cut our error rate by 20%.

Pattern 2: Multi-LLM Failover

Production systems can't go down. We built a failover architecture that routes between Claude and GPT-4 based on availability and rate limits.

class MultiLLMClient:
    def __init__(self, providers: list[LLMProvider]):
        self.providers = providers  # ordered by preference

    async def complete(self, prompt: str) -> str:
        for provider in self.providers:
            if provider.is_available():
                try:
                    return await provider.complete(prompt)
                except RateLimitError:
                    provider.mark_limited()
                    continue
        raise AllProvidersExhaustedError()

The key insight: rate-limiting guardrails must be proactive, not reactive. By the time you get a 429 response, you've already wasted latency. We track token consumption in real-time and start routing to the backup provider before hitting the limit.

Pattern 3: Self-Healing Agents

This is the one I'm most proud of. We built an agent that monitors system alerts, diagnoses root causes using RAG over our runbooks, and executes remediation scripts autonomously.

The architecture:

Alert ingestion — System alert comes in via webhook
Context retrieval — RAG searches our knowledge base of past incidents and runbooks
Diagnosis — LLM analyzes the alert + retrieved context to identify root cause
Action planning — Agent proposes remediation steps using function calling
Execution — Approved actions are executed via MCP server integration

The critical guardrail: never auto-execute destructive actions. Our agent can restart services and clear caches autonomously, but anything involving data modification requires human approval.

What I'd Do Differently

If I were starting over, I'd invest more in observability from day one. We eventually built an LLM-as-a-Judge pipeline with Langfuse that continuously evaluates agent responses, but we should have had that from the start. When you have multiple agents in a pipeline, debugging "why did the system give a wrong answer?" becomes a multi-step investigation. Good tracing makes this 10x easier.

Key Takeaways

Start simple — Router-based orchestration handles 80% of use cases
Build for failure — Multi-LLM failover is not optional in production
Observe everything — You can't improve what you can't measure
Guardrails are features — The best agent is one that knows when not to act

I'll dive deeper into each of these patterns in future posts. If you're building something similar, I'd love to hear about your approach — reach out on LinkedIn.

Essential Design Patterns Every Engineer Should Know

2025-02-25T00:00:00Z

Design patterns are battle-tested solutions to recurring problems in software design. You don't need to memorize all 23 Gang of Four patterns — but knowing the essential ones will make you a significantly better engineer.

Here are the patterns I use most in production systems, with practical examples that go beyond textbook definitions.

1. Factory Pattern — "Don't `new` Things Directly"

The Factory pattern delegates object creation to a separate method or class, so your code doesn't need to know the exact class it's creating.

Why It Matters

Imagine you're building an AI system that needs to support multiple LLM providers — Claude, GPT-4, and Gemini. Without a factory, every place you create a provider is tightly coupled:

# ❌ Bad — tightly coupled
if provider_name == "claude":
    client = ClaudeClient(api_key=key, model="claude-3")
elif provider_name == "gpt4":
    client = OpenAIClient(api_key=key, model="gpt-4")
elif provider_name == "gemini":
    client = GeminiClient(api_key=key, model="gemini-pro")

With a factory, creation logic lives in one place:

# ✅ Good — Factory pattern
class LLMFactory:
    _providers = {
        "claude": ClaudeClient,
        "gpt4": OpenAIClient,
        "gemini": GeminiClient,
    }

    @classmethod
    def create(cls, provider: str, **kwargs) -> LLMClient:
        if provider not in cls._providers:
            raise ValueError(f"Unknown provider: {provider}")
        return cls._providers[provider](**kwargs)

# Usage — clean and extensible
client = LLMFactory.create("claude", api_key=key, model="claude-3")

When to Use It

You need to create objects without specifying their exact class
You want to centralize complex creation logic
You need to swap implementations easily (testing, A/B experiments)

2. Observer Pattern — "Notify Everyone Who Cares"

The Observer pattern lets objects subscribe to events and get notified when something changes — without the publisher knowing who's listening.

Real-World Example

In a monitoring system, when a deployment completes, multiple systems need to react:

class EventBus:
    def __init__(self):
        self._subscribers: dict[str, list[Callable]] = {}

    def subscribe(self, event: str, callback: Callable):
        self._subscribers.setdefault(event, []).append(callback)

    def publish(self, event: str, data: dict):
        for callback in self._subscribers.get(event, []):
            callback(data)

# Set up observers
bus = EventBus()
bus.subscribe("deploy.complete", send_slack_notification)
bus.subscribe("deploy.complete", update_dashboard)
bus.subscribe("deploy.complete", run_smoke_tests)
bus.subscribe("deploy.failed", page_oncall_engineer)

# Publisher doesn't know or care who's listening
bus.publish("deploy.complete", {"service": "api", "version": "2.1.0"})

When to Use It

Multiple components need to react to the same event
You want loose coupling between event producers and consumers
Building notification systems, event-driven architectures, or UI state management

3. Strategy Pattern — "Swap Algorithms at Runtime"

The Strategy pattern lets you define a family of algorithms, put each in its own class, and make them interchangeable.

Real-World Example

Different documents need different chunking strategies for a RAG pipeline:

from abc import ABC, abstractmethod

class ChunkingStrategy(ABC):
    @abstractmethod
    def chunk(self, text: str) -> list[str]:
        pass

class FixedSizeChunking(ChunkingStrategy):
    def __init__(self, size: int = 512, overlap: int = 50):
        self.size = size
        self.overlap = overlap

    def chunk(self, text: str) -> list[str]:
        chunks = []
        for i in range(0, len(text), self.size - self.overlap):
            chunks.append(text[i:i + self.size])
        return chunks

class SemanticChunking(ChunkingStrategy):
    def chunk(self, text: str) -> list[str]:
        # Split on paragraph boundaries for semantic coherence
        return [p.strip() for p in text.split("\n\n") if p.strip()]

class DocumentProcessor:
    def __init__(self, strategy: ChunkingStrategy):
        self.strategy = strategy

    def process(self, document: str) -> list[str]:
        return self.strategy.chunk(document)

# Swap strategies without changing the processor
processor = DocumentProcessor(SemanticChunking())
chunks = processor.process(long_document)

When to Use It

You have multiple algorithms for the same task
You want to switch behavior at runtime based on context
You're building configurable pipelines (data processing, ML, ETL)

4. Singleton Pattern — "One Instance to Rule Them All"

The Singleton pattern ensures a class has exactly one instance and provides a global access point to it.

When It's Actually Useful

Database connection pools, configuration managers, and logging services are legitimate Singleton use cases:

class DatabasePool:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._pool = create_connection_pool(
                host="db.example.com",
                max_connections=20
            )
        return cls._instance

    def get_connection(self):
        return self._pool.acquire()

# Both references point to the same pool
db1 = DatabasePool()
db2 = DatabasePool()
assert db1 is db2  # True — same instance

⚠️ The Singleton Warning

Singletons are heavily overused. Before reaching for one, ask yourself:

Could I use dependency injection instead? (Usually yes, and it's more testable)
Am I using this as a global variable? (That's a code smell)
Will this make testing harder? (Singletons carry state between tests)

In modern Python, I prefer module-level instances or dependency injection over the classic Singleton pattern.

Cheat Sheet

Pattern	Problem It Solves	Key Signal
Factory	Complex object creation	`if/elif` chains creating objects
Observer	Multiple reactions to events	Several components watching for changes
Strategy	Swappable algorithms	`if/elif` chains choosing behavior
Singleton	Exactly one shared instance	Resource pools, config, logging

What About the Rest?

These four cover probably 70% of the patterns I use day-to-day. Other patterns worth learning next:

Decorator — Add behavior to objects dynamically (Python has this built-in with @decorators)
Adapter — Make incompatible interfaces work together
Builder — Construct complex objects step by step

The best way to learn patterns isn't memorization — it's recognizing the problems they solve in your own code. Next time you see a messy if/elif chain or tightly coupled components, that's your signal.

Have a favorite pattern I didn't cover? Let me know on LinkedIn.

System Design 101 — Building Blocks of Scalable Systems

2025-02-24T00:00:00Z

System design interviews can feel overwhelming, but every complex system is built from the same fundamental components. Once you understand these building blocks, you can reason about any architecture.

This post covers the core components you'll see in every system design — with practical explanations of why each exists and when to use it.

The Big Picture

Before diving into individual components, here's how they fit together in a typical web-scale system:

Every request flows through this chain. Let me break down each component.

1. Load Balancer — "Distribute the Traffic"

A load balancer sits in front of your servers and distributes incoming requests across multiple instances. It's the reason Netflix doesn't crash when 100 million people hit play at 8 PM.

How It Works

Client Request
      ↓
┌─────────────────┐
│  Load Balancer   │
└─────────────────┘
   ↓      ↓      ↓
Server1 Server2 Server3

Common Algorithms

Algorithm	How It Works	Best For
Round Robin	Requests rotate through servers in order	Equal-capacity servers
Least Connections	Sends to the server with fewest active connections	Varying request complexity
IP Hash	Same client IP always goes to same server	Session persistence
Weighted	More traffic to beefier servers	Mixed hardware

Key Insight

Load balancers also handle health checks — they stop sending traffic to unhealthy servers automatically. This is how you achieve high availability without manual intervention.

# Simplified health check logic
class LoadBalancer:
    def __init__(self, servers: list[Server]):
        self.servers = servers

    def get_healthy_server(self) -> Server:
        healthy = [s for s in self.servers if s.health_check()]
        return min(healthy, key=lambda s: s.active_connections)

2. Caching — "Don't Compute the Same Thing Twice"

Caching stores frequently accessed data in fast storage (usually memory) to avoid hitting slower databases or APIs repeatedly.

Cache Tiers

Request → L1 (In-Memory, ~1ms) → L2 (Redis, ~5ms) → Database (~50ms)

Cache Strategies

Cache-Aside (Lazy Loading) — The most common pattern:

async def get_user(user_id: str) -> User:
    # 1. Check cache first
    cached = await redis.get(f"user:{user_id}")
    if cached:
        return User.from_json(cached)

    # 2. Cache miss — fetch from database
    user = await db.query("SELECT * FROM users WHERE id = ?", user_id)

    # 3. Store in cache for next time (expire in 5 minutes)
    await redis.set(f"user:{user_id}", user.to_json(), ex=300)

    return user

Write-Through — Write to cache and database simultaneously:

async def update_user(user_id: str, data: dict):
    # Write to both — cache is always fresh
    await db.update("users", user_id, data)
    await redis.set(f"user:{user_id}", json.dumps(data), ex=300)

Cache Invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Common invalidation strategies:

Strategy	Mechanism	Trade-off
TTL (Time-to-Live)	Expires after N seconds	Simple but stale reads possible
Event-based	Invalidate on write events	Fresh data but more complex
Version-based	Key includes version number	Precise but coordination needed

3. Database — "Where the Truth Lives"

SQL vs NoSQL — The Real Decision Framework

Don't pick your database based on hype. Pick it based on your access patterns:

Choose SQL When	Choose NoSQL When
Data has relationships (joins)	Data is denormalized/nested
You need ACID transactions	You need horizontal scaling
Schema is well-defined	Schema evolves rapidly
Complex queries needed	Simple key-value lookups

Database Scaling Patterns

Read Replicas — Scale reads by copying data to replica databases:

Writes → Primary DB
Reads  → Replica 1, Replica 2, Replica 3

This is the first scaling move most systems make. It works because most applications are read-heavy (~90% reads, ~10% writes).

Sharding — Split data across multiple databases by a key:

Users A-M → Shard 1
Users N-Z → Shard 2

⚠️ Warning: Sharding adds massive complexity. Don't shard until you absolutely have to. A single well-optimized Postgres instance can handle millions of rows.

4. Message Queue — "Do It Later"

Message queues decouple producers from consumers, letting you process work asynchronously.

Why Queues Matter

Without a queue, if your email service is slow, your entire API slows down:

# ❌ Synchronous — user waits for email to send
POST /signup → Create User → Send Email → Return 200
                                  ↑
                         Slow! (2-5 seconds)

With a queue, the API returns instantly:

# ✅ Asynchronous — user gets instant response
POST /signup → Create User → Push to Queue → Return 200
                                     ↓
                        Worker picks up → Sends Email

Queue Pattern in Practice

# Producer — API server
async def handle_signup(request):
    user = await create_user(request.data)

    # Don't send email now — push to queue
    await queue.publish("emails", {
        "type": "welcome",
        "to": user.email,
        "name": user.name,
    })

    return {"status": "created"}  # Returns in ~50ms

# Consumer — Background worker
async def email_worker():
    async for message in queue.subscribe("emails"):
        await send_email(
            to=message["to"],
            template=message["type"],
            data=message,
        )

When to Use Queues

Sending emails/notifications
Processing images/videos
Syncing data between services
Any work that doesn't need an immediate response

5. CDN — "Serve Content Close to Users"

A Content Delivery Network caches your static content (images, CSS, JS) on servers worldwide, so users download from a nearby server instead of your origin.

Impact

Without CDN:  User in Tokyo → Server in Virginia → 200ms latency
With CDN:     User in Tokyo → CDN edge in Tokyo → 20ms latency

That's a 10x improvement just by putting a CDN in front of your static assets.

What to Put on a CDN

✅ Images, videos, fonts
✅ CSS and JavaScript bundles
✅ Static HTML pages
✅ API responses that rarely change
❌ User-specific dynamic content
❌ Real-time data

Putting It All Together

Here's how these components work for a real system — let's say a social media feed:

CDN serves profile images and static assets
Load Balancer routes API requests across server instances
API Servers handle business logic
Cache (Redis) stores pre-computed feeds and session data
Database stores users, posts, and relationships
Message Queue handles async work — push notifications, feed updates, email digests

Scaling Checklist

When your system needs to scale, follow this order:

Add caching (biggest bang for buck)
Add read replicas (scale reads)
Add a CDN (offload static content)
Add message queues (decouple and go async)
Add more app servers + load balancer (horizontal scaling)
Shard the database (last resort — high complexity)

Key Takeaways

Component	One-Line Summary
Load Balancer	Distributes traffic, enables horizontal scaling
Cache	Stores hot data in memory, reduces database load
Database	Persistent storage — pick SQL or NoSQL based on access patterns
Message Queue	Decouples services, enables async processing
CDN	Serves static content from edge locations near users

The beauty of these building blocks is that they compose. Start simple — a single server with a database — and add components as your scale demands them. Every engineering decision is a trade-off, and the best architecture is the simplest one that meets your requirements.

Next up: I'll dive deeper into database scaling patterns and when to actually shard. Follow me on LinkedIn to stay updated.

Rohit Marathe

LangChain vs LangGraph: Architecting Stateful Multi-Agent Systems

The Architectural Divide: Linear vs. Cyclic

LangChain: The Linear Evolution

LangGraph: The Agentic Revolution

Control Flow & Decision Making

Advanced State Management

Explicit State Schemas

Reliability: Persistence & Human-in-the-Loop

Checkpoints and Time Travel

Interrupts

Multi-Agent Architecture Patterns

Practical Example: The Self-Correction Loop

Security & Stability (March 2026 Update)

Choosing the Right Tool

Multi-Turn Conversation with LLM

Multi-Turn Conversation with LLM

LLM-Based Agent Framework

The Framework

1. Thinking Topic Development

2. Maintaining Topic Stack

3. Enriching Topic

4. Generating Response

Example: Collect Basic User Information

Task Definition

Sample Chat History

Agent Prompt Templates

Topic Manager Prompt

Enricher Prompt

Chat Prompt Template

Hello World — Why I'm Starting This Blog

Why a blog?

What I won't do

Let's connect

Multi-Agent Orchestration — Patterns That Actually Work

The Problem

Pattern 1: Router-Based Orchestration

Pattern 2: Multi-LLM Failover

Pattern 3: Self-Healing Agents

What I'd Do Differently

Key Takeaways

Essential Design Patterns Every Engineer Should Know

1. Factory Pattern — "Don't new Things Directly"

Why It Matters

When to Use It

2. Observer Pattern — "Notify Everyone Who Cares"

Real-World Example

When to Use It

3. Strategy Pattern — "Swap Algorithms at Runtime"

Real-World Example

When to Use It

4. Singleton Pattern — "One Instance to Rule Them All"

When It's Actually Useful

⚠️ The Singleton Warning

Cheat Sheet

What About the Rest?

System Design 101 — Building Blocks of Scalable Systems

The Big Picture

1. Load Balancer — "Distribute the Traffic"

How It Works

Common Algorithms

Key Insight

2. Caching — "Don't Compute the Same Thing Twice"

Cache Tiers

Cache Strategies

Cache Invalidation

3. Database — "Where the Truth Lives"

SQL vs NoSQL — The Real Decision Framework

Database Scaling Patterns

4. Message Queue — "Do It Later"

Why Queues Matter

Queue Pattern in Practice

When to Use Queues

5. CDN — "Serve Content Close to Users"

Impact

What to Put on a CDN

Putting It All Together

Scaling Checklist

Key Takeaways

1. Factory Pattern — "Don't `new` Things Directly"