Overview

Context engineering is the discipline of deciding what goes into the model’s context window, in what order, and within what token budget, on every call. It is the superset of prompt engineering: the prompt is one part of the context, alongside system instructions, retrieved documents, tool definitions, conversation history, and tool results. For an application, output quality tracks context quality more than wording. This page covers the practice; for prompt wording see best-practices and for the system layer see system-prompt-design-patterns.

Treat the context window as a scarce budget

Every token competes for a fixed window and dilutes attention. More context is not better; relevant, ordered context is. Budget the window across system prompt, retrieved evidence, history, and headroom for the response, and measure how much each section actually earns. See context-window.

Retrieve, do not stuff

Pull only the evidence the current step needs, ranked by relevance, rather than pasting everything that might help. Retrieval beats a giant static prompt because it scales and stays fresh. Tune chunk size, top-k, and filters so the model sees signal, not volume. See rag and rag-retrieval.

Order for recency and salience

Models weight the start and end of the context more than the middle, the lost-in-the-middle effect. Put the instruction and the most important evidence where attention is highest: stable instructions at the top, the current task and key evidence near the end. Keep low-value boilerplate out.

Compact long histories

In multi-turn and agent loops the context grows until it overflows or degrades. Summarize old turns, drop superseded tool results, and keep a running state object instead of the raw transcript. Compaction keeps the agent coherent over long horizons; see agent-architecture-patterns and prompt-chaining.

Keep stable prefixes for cache hits

Place the parts that do not change between calls, system prompt, tool definitions, few-shot examples, at the front so they hit the prompt cache. Reordering or editing a prefix invalidates the cache and raises cost and latency. See prompt-caching-strategies and prompt-cache.

Verify what the model actually saw

Log the assembled context per call, not just the prompt template. Most “the model ignored the instruction” bugs are context bugs: the instruction was truncated, buried in the middle, or crowded out by retrieved text. Inspect the real window before blaming the model.

Pitfalls

  • Maxing the window because it is available; irrelevant tokens lower accuracy and raise cost.
  • Putting the key instruction in the middle of a long context.
  • Letting agent history grow unbounded until it evicts the system prompt.
  • Editing a cached prefix on every call and paying full price each time.