Completion

Definition

A completion is the generated output returned by a language model given an input prompt. In the original completion API paradigm (GPT-3, early OpenAI), the model received raw text and generated a continuation. Modern APIs use a chat/messages format where input is structured as a list of turns, but the output is still called a completion.

A completion consists of one or more content blocks. In most APIs:

text blocks contain generated text.
tool_use blocks contain structured function calls (see tool-call).

Key completion parameters:

max_tokens: upper bound on output length in tokens.
temperature: controls randomness; 0 is near-deterministic, 1 is sampling from the full distribution.
top_p: nucleus sampling threshold; an alternative to temperature.
stop_sequences: strings that halt generation early (see stop-sequence).

The model generates one token at a time, sampling from a probability distribution over the vocabulary at each step. The sequence of tokens produced is the completion. The model cannot revise earlier tokens once generated.

stop_reason indicates why generation ended: "end_turn" (model decided to stop), "max_tokens" (hit the limit), "stop_sequence" (a stop sequence matched).

When it applies

Treat every completion as potentially truncated: check stop_reason. If stop_reason = "max_tokens", the output was cut off; increase max_tokens or break the task into smaller parts. For deterministic pipelines (classification, structured extraction) set temperature = 0. For creative generation set temperature between 0.7 and 1.0.

Example

import anthropic
 
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain backpressure in one paragraph."}]
)
 
print(f"Stop reason: {message.stop_reason}")
print(message.content[0].text)

token - completions are generated one token at a time.
stop-sequence - stop sequences bound completion length semantically.
temperature - temperature controls the randomness of each token selection.
prompt-cache - prompt caching reduces the cost of repeated completions over the same context.
prompt-design - the prompting deep-dive on eliciting useful completions.

Citing this term

See Completion (llmbestpractices.com/glossary/completion).

LLM Best Practices

Explorer

Definition

When it applies

Example

Citing this term

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Completion

Definition

When it applies

Example

Related concepts

Citing this term

Related

Graph View

Table of Contents

Backlinks