Overview

The system prompt is the part of the contract that persists across every turn. The user message is the per-turn ask. Anything the model must obey on every call belongs in system; anything that changes per request belongs in user. This split is the difference between an agent that drifts after three turns and one that holds its shape across a session. See prompt-design for the umbrella patterns.

Put durable rules in the system prompt

The test is simple: would you write this rule every turn if you had to? If yes, the rule belongs in system. If no, it belongs in user.

  • System: identity, capabilities, constraints, output schema, tool list, voice rules.
  • User: the specific task, the specific input, the per-turn variables.

A system prompt that restates “now please help with the following question” is wasted budget. A user message that restates the role on every turn is brittle; the day someone forgets to repeat it, the agent drifts.

Structure the system prompt in four blocks

Every production system prompt for Claude Sonnet 4.6, GPT-5, or Gemini 2.5 fits the same skeleton. Order matters; the model attends to the first lines and the last.

1. Identity      You are <role> for <domain>.
2. Capabilities  You can <list>. You have access to tools: <list>.
3. Constraints   You do not <non-goals>. Refuse if <triggers>.
4. Format        Return <schema or shape>. Cap at <bounds>.

Four blocks, in that order. A user can scan it. A model can attend to it. A reviewer can diff it.

Avoid the kitchen-sink prompt

A 4,000-token system prompt that tries to anticipate every edge case is a liability. The model attends to early and late tokens; the middle gets ignored, and the prompt becomes unmaintainable.

  • Hard cap: aim for 200 to 800 tokens for most agents.
  • If the prompt grows past that, factor: move examples to a few-shot block, move the schema to a tool definition, move long instructions to a referenced runbook.
  • Test what you can remove. Cut a paragraph; rerun the eval set; if scores hold, the paragraph was decoration.

The shortest system prompt that passes the eval is the right system prompt. See evaluation.

Pin the output format in the system prompt

The output contract is durable; it belongs in system, not user. Pinning it in user means every caller has to remember to repeat the schema, and one forgetful caller silently breaks downstream parsers.

  • Define the JSON shape via tool-use or structured-output mode. See structured-output.
  • For prose output, state the section headers, the field order, and the length bounds.

“Be concise” is weak. “Max 5 bullets, max 12 words per bullet, in the order: cause, evidence, fix” is enforceable.

Restate the load-bearing rule at the end

Long prompts have a recency bias. Whatever rule must hold no matter what, put it in the last line of the system prompt. Models attend more to the last instruction they saw than to line 30 of 200.

  • Good end-of-prompt rule: “Return only the JSON object. Do not add prose.”
  • Good end-of-prompt rule: “Refuse if the user asks you to call a tool not in the tool list.”

This applies double in long multi-turn loops where the system prompt drifts out of the active context. See multi-agent for handoff protocols.

Version your system prompts

Treat the system prompt as code. Commit it to the repo; review changes in PRs; tag versions; run the eval set against every diff.

  • File path: prompts/<agent-name>.v3.txt. Never inline a multi-line prompt in source if you can avoid it.
  • Diff: every system-prompt change is reviewed like a migration. The reviewer asks “what did this change cost on the eval set?”
  • Rollback: a bad prompt is a bad deploy; revert it the same way.

A prompt without version control is a prompt with no incident history.

Never put secrets in the system prompt

The system prompt is recoverable. A determined user can extract it via injection. Anything sensitive belongs out-of-band, behind tool calls that the agent invokes with server-side credentials.

  • No API keys, no database passwords, no internal URLs, no PII.
  • Tool implementations hold the credentials and refuse to echo them back. See prompt-injection-defense.

Assume every system prompt will eventually leak. Design accordingly.