Overview

LLMOps is the operational discipline for shipping and running LLM features: versioning the prompt and model, gating releases on evaluations, observing behavior in production, and controlling cost and risk. It adapts MLOps to systems whose core component is nondeterministic and changes underneath you when a provider updates a model. This page is the ops pillar; for measurement see llm-evaluation-in-production, for monitoring see llm-observability, and for the application-side reliability rules see reliable-agents-in-production. Dated 2026-06; the tooling here moves fast.

Version prompts and models like code

The prompt and the model id are deployable artifacts. Keep prompts in version control next to the code, tag each version, and pin the exact model id rather than a floating alias so a silent provider update cannot change behavior. Record which prompt and model served each request. See prompt-evals.

Gate every release on an eval

Do not ship a prompt or model change on vibes. Maintain a golden set of representative inputs with expected outcomes, run it in CI on every change, and block the release if the pass rate regresses. Evals are the LLMOps equivalent of a test suite. See llm-evaluation-in-production and golden-set.

Instrument before you scale

Trace every request: prompt, model, token counts, latency, tool calls, and outcome. Without traces a production regression is unreproducible. Build on general observability practice; see llm-observability and observability.

Control cost as a first-class metric

LLM spend scales with traffic and can spike without warning. Cache stable prefixes, route easy requests to cheap models, batch where latency allows, and cap spend per user and per request. Track cost per request next to latency. See cost-control.

Build the safety and abuse layer

Treat user and retrieved content as untrusted, filter inputs and outputs, rate-limit per user, and log refusals. Prompt injection and data exfiltration are operational risks, not just model behavior. See prompt-injection-defense.

Ship with rollback and a fallback ladder

Deploy prompt and model changes behind a flag, canary to a slice of traffic, and keep the previous version one switch away. At runtime, fall back from strong model to cheap model to a deterministic default when the primary is slow, rate-limited, or down. Plan the rollback before the rollout.

Verification

  • Confirm the eval gate runs in CI and blocks on regression.
  • Confirm every request is traced with prompt and model version.
  • Force a provider timeout in staging; confirm the fallback ladder engages.
  • Confirm cost caps and abuse rate limits fire under load.