Definition
A completion is the generated output returned by a language model given an input prompt. In the original completion API paradigm (GPT-3, early OpenAI), the model received raw text and generated a continuation. Modern APIs use a chat/messages format where input is structured as a list of turns, but the output is still called a completion.
A completion consists of one or more content blocks. In most APIs:
textblocks contain generated text.tool_useblocks contain structured function calls (see tool-call).
Key completion parameters:
max_tokens: upper bound on output length in tokens.temperature: controls randomness; 0 is near-deterministic, 1 is sampling from the full distribution.top_p: nucleus sampling threshold; an alternative to temperature.stop_sequences: strings that halt generation early (see stop-sequence).
The model generates one token at a time, sampling from a probability distribution over the vocabulary at each step. The sequence of tokens produced is the completion. The model cannot revise earlier tokens once generated.
stop_reason indicates why generation ended: "end_turn" (model decided to stop), "max_tokens" (hit the limit), "stop_sequence" (a stop sequence matched).
When it applies
Treat every completion as potentially truncated: check stop_reason. If stop_reason = "max_tokens", the output was cut off; increase max_tokens or break the task into smaller parts. For deterministic pipelines (classification, structured extraction) set temperature = 0. For creative generation set temperature between 0.7 and 1.0.
Example
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain backpressure in one paragraph."}]
)
print(f"Stop reason: {message.stop_reason}")
print(message.content[0].text)Related concepts
- token - completions are generated one token at a time.
- stop-sequence - stop sequences bound completion length semantically.
- temperature - temperature controls the randomness of each token selection.
- prompt-cache - prompt caching reduces the cost of repeated completions over the same context.
- prompt-design - the prompting deep-dive on eliciting useful completions.
Citing this term
See Completion (llmbestpractices.com/glossary/completion).