Overview
A Modelfile is a declarative configuration file that turns a base model into a named local image with a pinned system prompt, generation parameters, and template. It is the Ollama equivalent of a Dockerfile: check it in alongside your agent code so the model and the prompt move together. Every production agent that runs on Ollama should be built from a Modelfile, not from a bare ollama run call.
Start every Modelfile with a pinned FROM tag
FROM specifies the base model. Use the full model tag including quantization. A tag without a quantization suffix can resolve to a different file when the registry adds a new variant.
FROM qwen2.5-coder:32b-instruct-q4_K_M
- Always include the quantization suffix (
q4_K_M,q8_0, etc.). See ollama-quantization for the quantization levels. - You can also
FROMa local GGUF file path:FROM /models/llama3.3-q4.gguf. Use this when distributing models in air-gapped environments. - The base model must be present on the host before
ollama createruns. Pull it first withollama pull.
Write the SYSTEM instruction as a single stable block
SYSTEM sets the system prompt that every conversation starts with. Keep it focused on role, output format, and hard constraints.
FROM qwen2.5-coder:32b-instruct-q4_K_M
SYSTEM """
You are a code reviewer for a TypeScript monorepo.
Return your review as a JSON object with keys:
- "summary": one sentence
- "issues": list of {file, line, severity, message}
- "approved": boolean
Do not include any text outside the JSON object.
"""
See system-prompts for the rules on system prompt structure and structured-output for format constraints. Avoid dynamic content in the system prompt; dynamic context belongs in the user message.
Set PARAMETER values to match the task
PARAMETER lines override the model’s default generation settings. Set only the parameters your task requires.
PARAMETER temperature 0.2
PARAMETER num_ctx 16384
PARAMETER num_predict 2048
PARAMETER top_p 0.9
PARAMETER stop "</review>"
Key parameters and their effects:
temperature: 0.0 to 0.2 for extraction and classification; 0.7 to 1.0 for creative tasks.num_ctx: the context window in tokens. Set to the largest input you actually send, not the model’s maximum.num_predict: maximum output tokens. Always set this in batch pipelines to prevent runaway generation.stop: one or more stop sequences. Useful for structured output templates where the model should stop after a closing delimiter.
Use TEMPLATE only when the base model requires a custom chat format
Most models in the Ollama registry ship with the correct chat template embedded in the GGUF file. Override TEMPLATE only when the default is wrong or absent.
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
<|assistant|>
{{ end }}{{ .Response }}<|end|>"""
Incorrect templates are the most common source of “the model ignores instructions” bugs. When output quality is lower than expected from a bare ollama run, compare your template to the one in the original model card.
Build and test the image with ollama create and ollama run
ollama create code-reviewer -f Modelfile
ollama run code-reviewer "Review this function: function add(a, b) { return a + b }"The ollama create step compiles the Modelfile and registers the image locally. It does not re-download the base model if it is already present.
- Name images by role and version:
code-reviewer-v2,support-triage-v1. - Check the Modelfile into version control. The image itself is local to the host; the Modelfile is portable.
- When updating a Modelfile, increment the version in the name. Do not overwrite a running image in production; create the new image and switch traffic.
List, inspect, and clean up images
ollama list # All local images and their sizes
ollama show code-reviewer # Modelfile and parameters for a specific image
ollama rm code-reviewer-v1 # Remove an old image to reclaim disk spaceImages from Modelfiles share the base model layers on disk. Removing code-reviewer-v1 does not remove the underlying qwen2.5-coder:32b-instruct-q4_K_M layers; those are shared with any other image built from the same base.