Definition

Knowledge distillation is a training technique where a smaller student model is trained to match the behavior of a larger teacher model. The teacher generates outputs (soft labels, logit distributions, or full completions) on a dataset; the student is trained to reproduce them. The result is a compact model that retains most of the teacher’s quality on the target task at a fraction of the inference cost.

Two main forms:

  1. Output distillation (black-box): the teacher generates text completions on a prompt set; the student fine-tunes on those completions. This works with API-only access to the teacher. The student learns to match the teacher’s outputs.

  2. Logit distillation (white-box): the student minimizes the KL divergence between its token probability distribution and the teacher’s. Requires access to the teacher’s internal logits. More data-efficient than output distillation.

Distillation differs from fine-tuning in that the training signal is machine-generated from a stronger model rather than human-labeled. It is also distinct from quantization (which compresses weights without retraining).

When it applies

Use distillation when you need a smaller, faster, cheaper model for a specific task and a strong teacher model is available. Collect teacher outputs on a representative dataset (1,000 to 100,000+ examples). Evaluate the student against the teacher on a held-out test set using an evaluation harness. Distillation works best when the task is well-defined and the teacher’s outputs are high quality.

Do not expect the student to generalize beyond the teacher’s distribution; distillation is task-specific.

Example

import anthropic
client = anthropic.Anthropic()
 
prompts = load_training_prompts()  # your task dataset
teacher_examples = []
 
for prompt in prompts:
    response = client.messages.create(
        model="claude-opus-4-5",  # teacher
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    teacher_examples.append({
        "prompt": prompt,
        "completion": response.content[0].text
    })
 
save_jsonl(teacher_examples, "distillation_train.jsonl")
  • fine-tuning - distillation uses machine-generated labels; fine-tuning uses human-labeled data. Both update model weights.
  • evaluation-harness - measure whether the student matches the teacher on the target task.
  • golden-set - use a golden set to cap acceptable quality degradation from teacher to student.
  • completion - distillation uses completions from the teacher as training targets.
  • prompt-design - prompting the teacher to produce high-quality training examples is critical.

Citing this term

See Distillation (llmbestpractices.com/glossary/distillation).