Overview

Multi-agent systems are coordination, not magic. Adding a second agent helps when it lets you parallelize independent work, separate cheap and expensive context, or isolate a tool budget. It hurts when it adds a hop without doing any of those things. Start with one agent; promote to multi-agent when the single-agent design is the bottleneck.

Use multiple agents when the work is genuinely parallel

Three signals that a job wants multiple agents.

  • Independent subtasks. Reviewing 20 files, scraping 100 URLs, generating 8 variants. Fan out, gather.
  • Asymmetric context cost. A planner needs the full repo context once; an executor only needs the diff for the current task. Keep them separate so the executor’s context stays small. The brief pattern in claude-code is this split.
  • Isolated tool budgets. A search agent gets a search tool; a writer gets a file-write tool. Neither can call the other’s tools by accident.

If none of these apply, a single agent with a longer system prompt is cheaper and easier to debug.

Orchestrator-worker for fan-out work

One agent decomposes the task, dispatches workers, and merges results. Workers do not talk to each other.

Orchestrator: "Review these 12 PRs. For each, return a verdict object."
  → Worker[1]: review PR #401 → {verdict, reasons}
  → Worker[2]: review PR #402 → {verdict, reasons}
  ...
Orchestrator: merge into a single report.

Workers are stateless across PRs. The orchestrator owns the merge and is the only agent with the global view.

Planner-executor when context is expensive

The planner reads the whole world once and writes a plan. The executor follows the plan, one step at a time, with the plan as its only context.

  • Planner output is a numbered list of steps, each with inputs, expected outputs, and a verification command.
  • Executor reads step N, runs it, verifies, advances to step N+1.

This is the pattern Claude Code uses with the brief. The plan lives in a file the executor re-reads on every step.

Reviewer-and-fixer loop for quality

A writer produces a draft; a reviewer returns a structured critique; the writer revises. Two agents, bounded loop.

loop until reviewer.verdict == "approve" or rounds >= 3:
  draft = writer.run(brief, prior_feedback)
  feedback = reviewer.run(draft, rubric)
  prior_feedback = feedback

The reviewer must return structured output (see prompt-design). Free-form critique loops do not terminate cleanly.

Communicate through structured handoffs

Agents talk through JSON, not chat. Every handoff is a typed message with fields the receiver expects.

{
  "task_id": "review-401",
  "input": { "pr_url": "..." },
  "output": { "verdict": "approve", "reasons": ["..."] },
  "cost": { "input_tokens": 4210, "output_tokens": 380 }
}

Free-form chat between agents drifts. Schemas force the protocol to stay tight, and they make logs greppable.

Cap iterations and spend per task

Every loop needs a termination condition. Every agent run needs a cost ceiling.

  • Hard cap on iterations: max_rounds = 3 for review loops, max_steps = 30 for executors.
  • Token budget per task: stop when input + output exceeds N tokens; return partial results.
  • Wall-clock timeout: stop after T seconds regardless of state.

Agent loops with no termination burn money silently. The cheapest bug in this space is “ran for 14 hours, did nothing useful.”

Log every step for replay

Persist every prompt, response, and tool call to a structured log. When a run fails, you need to replay it without rerunning the model.

  • One log line per agent turn, with agent_id, turn, input, output, tool_calls, cost.
  • Store enough to reconstruct the next turn from the previous one.

Logs are also the input to evals: a golden set of (input, expected output) pairs is built from replayed runs.