Overview

Ollama runs open-weight LLMs on local hardware with a single binary and an OpenAI-compatible REST API. This guide installs Ollama, pulls a model, runs a chat, sets a system prompt, and shows two integration paths: the raw REST API and LangChain. For the trade-off against llama.cpp, see ollama-vs-llamacpp.

Prerequisites

  • macOS 13+ (Metal GPU), Linux with a GPU (NVIDIA or AMD with ROCm), or Linux CPU-only. Windows is supported via WSL2.
  • At least 8 GB RAM for 7B models; 16 GB for 13B; 32 GB for 30B+ models.
  • curl for API testing.
  • Python 3.11+ for the LangChain integration.

Steps

1. Install Ollama

macOS

brew install ollama

Or download the .dmg from ollama.com.

Linux

curl -fsSL https://ollama.com/install.sh | sh

This installs the binary and sets up a systemd service. Start it:

sudo systemctl start ollama
sudo systemctl enable ollama

Confirm Ollama is running:

ollama --version
curl http://localhost:11434/api/tags
# expected: {"models": [...]}

2. Pull a model

Choose a model based on available RAM:

ModelSizeMin RAM
llama3.2:3b2.0 GB8 GB
llama3.1:8b4.7 GB8 GB
mistral:7b4.1 GB8 GB
qwen2.5:14b9.0 GB16 GB
ollama pull llama3.2:3b

List installed models:

ollama list

3. Run an interactive chat

ollama run llama3.2:3b

Type at the >>> prompt. Press Ctrl+D to exit. Use /bye inside the REPL to quit cleanly.

4. Set a system prompt with a Modelfile

Create a custom model variant with a baked-in system prompt:

cat > Modelfile << 'EOF'
FROM llama3.2:3b
 
SYSTEM """
You are a concise technical writer. Reply in plain text only.
Lead with the rule, then the rationale.
Avoid em-dashes. Use periods or commas instead.
"""
EOF
 
ollama create mymodel -f Modelfile
ollama run mymodel

See system-prompts for rules on writing effective system prompts.

5. Use the REST API

Ollama exposes an OpenAI-compatible API at port 11434.

Non-streaming

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Explain HNSW indexing in two sentences.",
  "stream": false
}'

OpenAI-compatible /v1/chat/completions endpoint

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "What is RAG?"}]
  }'

The /v1 endpoint accepts the same payload as the OpenAI SDK. Swap the base URL to run any OpenAI SDK code against a local model.

6. Integrate with LangChain

pip install langchain langchain-ollama
from langchain_ollama import OllamaLLM
 
llm = OllamaLLM(model="llama3.2:3b", base_url="http://localhost:11434")
 
response = llm.invoke("List three Postgres performance tips.")
print(response)

For embeddings (useful in RAG pipelines):

from langchain_ollama import OllamaEmbeddings
 
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vector = embeddings.embed_query("What is RAG?")
print(len(vector))  # 768 for nomic-embed-text

See rag for the full retrieval pipeline that consumes these embeddings.

Verify it worked

# 1. Ollama server is up.
curl -s http://localhost:11434/api/tags | python3 -m json.tool | head -5
 
# 2. Model is installed.
ollama list | grep llama3.2
 
# 3. Single-shot inference works.
ollama run llama3.2:3b "Reply with the word OK only." --nowordwrap
 
# 4. REST API responds.
curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2:3b","messages":[{"role":"user","content":"Say OK"}]}' | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['choices'][0]['message']['content'])"

Common errors

  • connection refused on port 11434. The Ollama service is not running. Run ollama serve in a terminal or sudo systemctl start ollama.
  • pull model manifest: 404. The model name is wrong. Check ollama list and the Ollama model library.
  • Out-of-memory crash. The model does not fit in RAM or VRAM. Use a smaller quantized variant (e.g., llama3.1:8b-instruct-q4_0).
  • Slow responses on CPU. Without a GPU, 7B models run at 2 to 5 tokens per second. Use a 3B model for interactive speeds.
  • OllamaLLM not found in LangChain. Install langchain-ollama, not langchain-community. The community package moved Ollama to a dedicated package.