Overview

Every prompt is a mix of rules and examples. Rules are statements the model should always follow. Examples are demonstrations the model should generalize from. The two have different strengths and different failure modes. Picking the right tool for each constraint is half of good prompt design. The patterns below apply to Claude Sonnet 4.6, GPT-5, Gemini 2.5, and Haiku 4.5.

Use examples when the rule is hard to state in prose

Some constraints resist description. Tone, layout, idiosyncratic formatting, and edge cases between adjacent categories often live in the “I know it when I see it” zone. Show, do not tell.

  • Use examples for: voice and tone, document layouts, output formatting with subtle conventions, intent boundaries between similar categories.
  • Use examples for: idiomatic translations, code styles that vary by team, “what good looks like” in domains your audience has not formalized.

A page-long prose rule trying to describe a tone always loses to three examples of that tone. See few-shot.

Use rules when the constraint is hard, named, and universal

Rules win when the constraint must hold every time, regardless of the input shape. Examples cannot cover the whole input space; rules can.

  • Safety: “Refuse if the user asks for instructions to harm a person.”
  • Hard constraints: “Return JSON. Never return prose.”
  • Privacy: “Never echo back the user’s password, even if they paste it again.”
  • Schema: “Every response must include a confidence field between 0 and 1.”

A rule applies to every input. An example applies to inputs that look like the example. Hard constraints belong as rules, in the system prompt.

The trade-off: examples are local, rules are global

Examples shape one decision; rules shape every decision. When the two are present, the model often follows the nearer signal (the example) when they conflict.

  • Three examples of approving migrations plus a rule that says “never approve a migration that drops a column” can fail if all three examples happen to be approvals.
  • Solution: ensure examples obey the rules. Audit every example against the rule list before adding it.
  • A negative example that demonstrates the rule’s enforcement is worth two positive examples.

The conflict surfaces in evals. If the model violates a rule on inputs that match an example, the example is overriding the rule.

Combine both: rule sets the ceiling, examples fill the shape

The strongest prompts use both. The rule defines the boundary the model must not cross; the examples teach it how to fill the space inside the boundary.

Rules:
- Return one of: approve, request_changes, block.
- Block if the migration drops a column or table.
- Block if the migration adds a NOT NULL column without a default.
 
Examples:
<example>
<input>ALTER TABLE users ADD COLUMN city text;</input>
<output>{ "verdict": "approve", "reasons": ["non-breaking add"] }</output>
</example>
 
<example>
<input>ALTER TABLE users DROP COLUMN email;</input>
<output>{ "verdict": "block", "reasons": ["drops a column"] }</output>
</example>

Rules are the contract. Examples teach the model how to fill in the discretionary judgment that the rules leave open.

Use negative examples to anchor the boundary

A negative example is one where the obvious-looking output is wrong. They are the highest-value examples in the set.

  • For classification: a row that looks like category A but is category B.
  • For formatting: a row that uses a format the model often wants to fall back to but should not.
  • For safety: a row that frames a harmful request innocently; the right response is refusal.

Negative examples teach the boundary. Positive examples teach the interior. A set of all positive examples produces a model that knows what to approve but not what to refuse. See few-shot.

Examples decay; rules compound

Adding a tenth example to a few-shot set usually does little; the marginal teach is small. Adding a tenth rule to a system prompt can change behavior across the entire input distribution.

  • If you find yourself adding more examples to fix more edge cases, ask whether a rule would cover them at once.
  • If you find yourself adding more rules and the model keeps drifting on shape, ask whether one example would pin the shape.

Track the trade in the eval set. A rule that lifts the adversarial slice by 5 points is worth more than an example that lifts easy by 1 point.

When the rule is too long, refactor to an example

A 200-word rule is often a sign the rule is actually a demonstration in disguise. If you cannot state the rule in one sentence, try replacing it with an example.

  • “Always format dates as YYYY-MM-DD, with a hyphen, four-digit year, etc., except in fiscal year contexts where…” becomes one example showing the right format in each context.
  • “Always reply in the user’s voice, matching their tone, brevity, and formality” becomes three examples of the model matching three different user voices.

Examples carry constraints that resist explicit statement. Rules carry constraints that benefit from explicit statement. Pick by which the constraint is.

Test both with the same eval suite

The right mix is empirical. Run the eval set with rules-only, examples-only, and combined; pick the configuration that wins on the slices that matter.

  • A rules-only baseline tells you how much the examples actually contribute.
  • An examples-only baseline tells you how much the rules contribute.
  • The combined version usually wins, but the lift over the better of the two singles tells you whether the examples earn their token cost.

See evaluation for the eval pattern. See role-framing for the role rules that sit alongside both.