Overview

Tool use, also called function calling, lets an LLM invoke external functions by emitting a structured call that your code executes and returns. Reliable tool use is a schema-design problem: the model calls the right tool with valid arguments when the tool is named, typed, and described well, and fails when it is not. This page covers the provider-agnostic rules; for the Model Context Protocol specifics see mcp-tool-design, and for the loop that consumes tool results see agent-architecture-patterns.

Define one clear job per tool

Each tool should do one thing a name can describe. search_orders beats a do_stuff tool with a mode flag. A focused tool is easier for the model to select correctly and easier for you to validate. Overlapping tools cause the model to pick the wrong one; keep the set small and distinct.

Write the description for the model, not the docs

The tool description is a prompt. State what the tool does, when to use it, when not to, and what it returns, in plain language. Name units and formats in the description, not just the schema. A good description prevents the most common failure: the model calling a valid tool at the wrong time. See mcp-tool-design.

Type every argument and constrain the schema

Declare each parameter with a JSON Schema type, mark required fields, use enums for closed sets, and add format hints (date, email). Constrained schemas shrink the space of malformed calls. This is the same discipline as structured-output and output-constraints.

Validate inputs before executing

Never trust the arguments the model emits. Validate against the schema, then against business rules, before the call touches a system. A model can hallucinate an ID or an out-of-range value. Validation is also the first line of defense when tool arguments are derived from untrusted text; see prompt-injection-defense.

Return errors the model can act on

When a call fails, return a structured error that says what went wrong and how to fix it, not a stack trace. “order_id not found; call search_orders first” lets the model recover; “Error 500” makes it retry blindly. Useful tool-result errors turn a dead end into a self-correction.

Keep the tool surface minimal and safe

Expose the fewest tools that solve the task. Gate destructive actions behind confirmation, sandbox side effects, and rate-limit per session. Every tool is attack surface and failure surface; see reliable-agents-in-production.

Pitfalls

  • Vague tool names and descriptions; the model guesses and calls the wrong one.
  • Free-text arguments where an enum would do; constrain the inputs.
  • Trusting model-emitted arguments without validation.
  • Returning raw exceptions instead of actionable tool-result errors. See function-calling and tool-call for the underlying mechanics.