Overview

This page is the atomic definition. Prompt budgeting and cost management live at prompt-design.

Definition

A token is the atomic unit of text a language model encodes and processes. Tokenizers (OpenAI uses Byte Pair Encoding via tiktoken; Anthropic uses a similar BPE scheme) split text into subword units. Common English words are a single token; rare words, numbers, and code identifiers are often split into multiple tokens. Rough rule of thumb for English: 1 token per 4 characters, or 100 tokens per 75 words. Chinese, Japanese, and Korean are denser: 1 character may be 1-2 tokens but represents more information. Tokens matter for three reasons: (1) the context-window is measured in tokens; (2) API pricing is per-token (input and output priced separately); (3) the model generates one token at a time at inference, so longer outputs cost more and take longer. Count tokens before making API calls using the provider’s tokenizer library to budget accurately.

When it applies

Count tokens when designing prompts that approach the context limit, when estimating API costs, and when implementing streaming responses that should stop at a target length.

Example

The phrase “ChatGPT is a large language model” tokenizes as: Chat, G, PT, is, a, large, language, model = 8 tokens.

The word “antidisestablishmentarianism” may tokenize into 6-8 tokens.

  • context-window - context windows are measured in tokens.
  • temperature - temperature controls the probability distribution over the next token.
  • top-p - top-p filters the token vocabulary before sampling.
  • prompt-design - prompt budgeting starts with accurate token counts.
  • semantic-cache - cache hit rates depend on prompt token stability.

Citing this term

See Token (llmbestpractices.com/glossary/token).