LangGraph & Stateful Workflows

Hard 35 min read

What is LangGraph?

Why LangGraph Matters

The Problem: Linear agent chains break down when you need branching logic, cycles, persistent state, or human approval steps in your workflows.

The Solution: LangGraph models agent workflows as stateful graphs where nodes are computation steps and edges define control flow, enabling complex multi-step processes with built-in persistence.

Real Impact: LangGraph powers production agent systems at enterprises needing reliable, resumable, and auditable AI workflows.

Real-World Analogy

Think of LangGraph as a subway system map:

  • Nodes (Stations) = Processing steps where work happens
  • Edges (Tracks) = Connections defining which station comes next
  • Conditional Edges = Switch tracks based on conditions (like transfer stations)
  • State = Your ticket that carries information between stations
  • Checkpoints = Saved positions so you can resume your journey

Core LangGraph Concepts

StateGraph

The main class for defining graph-based workflows. Each graph has a defined state schema and a set of nodes connected by edges.

Nodes

Python functions that receive the current state, perform computation, and return state updates. Each node represents one step in the workflow.

Conditional Edges

Dynamic routing that evaluates state to determine which node to execute next, enabling branching and looping patterns.

Checkpointing

Built-in persistence that saves graph state after each node, enabling resume, replay, and human-in-the-loop interactions.

Key Takeaway: LangGraph extends LangChain by treating agent workflows as directed graphs with nodes (processing steps), edges (transitions), and state (shared context). This gives you explicit control over the agent's decision flow, unlike the implicit loops of basic agent frameworks.

Graph Concepts

LangGraph State Machine Diagram
START Agent Route? Tools Human END use_tool needs_human observation approved done

Building Graphs

langgraph_basic.py
from typing import TypedDict, Annotated, Sequence
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import BaseMessage, HumanMessage
import operator

# Define the state schema
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    next_action: str

# Define node functions
def agent_node(state: AgentState) -> AgentState:
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke(state["messages"])
    return {"messages": [response], "next_action": "decide"}

def tool_node(state: AgentState) -> AgentState:
    # Execute tool calls from the last message
    last_msg = state["messages"][-1]
    result = execute_tools(last_msg.tool_calls)
    return {"messages": [result]}

# Define routing logic
def should_continue(state: AgentState) -> str:
    last_msg = state["messages"][-1]
    if last_msg.tool_calls:
        return "tools"
    return "end"

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)

graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
graph.add_edge("tools", "agent")

# Compile and run
app = graph.compile()
result = app.invoke({
    "messages": [HumanMessage(content="What is 25 * 4?")]
})
Output
Graph compiled: 4 nodes, 5 edges
Node "classifier" -> Node "researcher" (condition: needs_research)
Node "classifier" -> Node "responder" (condition: can_answer)
Node "researcher" -> Node "responder"
Node "responder" -> END

Running: "What caused the 2008 financial crisis?"
Step 1: classifier -> needs_research
Step 2: researcher -> found 3 relevant sources
Step 3: responder -> generated 250-word answer
Done in 4.2s

Common Mistake

Wrong: Creating graph cycles without a maximum iteration limit

Why it fails: An agent that loops between "research" and "evaluate" nodes can run indefinitely if the evaluation criteria are never met, consuming unlimited tokens and time.

Instead: Always set recursion_limit on your graph. Add a counter to your state and include an exit condition that triggers after N iterations: "If we've researched 3 times, synthesize what we have."

State Management

State Reducers

  • operator.add: Appends new values to existing list (for messages)
  • Default: Overwrites the previous value with the new one
  • Custom: Write your own reducer function for complex merge logic
  • Annotated: Use Python's Annotated type to attach reducers to fields
state_with_persistence.py
from langgraph.checkpoint.sqlite import SqliteSaver

# Add persistence with checkpointing
memory = SqliteSaver.from_conn_string(":memory:")

app = graph.compile(checkpointer=memory)

# Run with a thread ID for persistence
config = {"configurable": {"thread_id": "user-123"}}

# First interaction
result1 = app.invoke(
    {"messages": [HumanMessage(content="Hi, my name is Alice")]},
    config
)

# Second interaction - remembers context!
result2 = app.invoke(
    {"messages": [HumanMessage(content="What's my name?")]},
    config
)
# Agent responds: "Your name is Alice"

# Human-in-the-loop: interrupt before a node
app_with_interrupt = graph.compile(
    checkpointer=memory,
    interrupt_before=["tools"]  # Pause before tool execution
)

Checkpointing

Common Pitfall

Problem: State grows unbounded as messages accumulate, leading to token limit errors.

Solution: Implement state trimming in your nodes. Use a custom reducer that keeps only the last N messages, or summarize older messages before adding new ones.

Feature LangChain AgentExecutor LangGraph
Control Flow Linear loop Arbitrary graph with cycles
State Messages only Custom typed state schema
Persistence Manual memory Built-in checkpointing
Human-in-Loop Not built-in interrupt_before / interrupt_after
Branching Limited Conditional edges with routing
Deep Dive: LangGraph Checkpointing

LangGraph's checkpointing lets you save and resume graph execution at any node. This is critical for human-in-the-loop workflows: the graph pauses at an approval node, persists its state to a database, and resumes when the human approves. Checkpointing also enables time-travel debugging -- replay the graph from any previous state to understand how the agent reached a particular decision. Use PostgreSQL or Redis backends for production; SQLite for development.

Quick Reference

Essential LangGraph API

Function Description Example
StateGraph() Create a state graph StateGraph(AgentState)
.add_node() Add a node graph.add_node("name", func)
.add_edge() Add a fixed edge graph.add_edge("a", "b")
.add_conditional_edges() Add routing edge graph.add_conditional_edges("a", fn, map)
.set_entry_point() Set start node graph.set_entry_point("agent")
.compile() Compile the graph app = graph.compile(checkpointer=mem)
interrupt_before Pause before node compile(interrupt_before=["tools"])