How to set up Ollama locally

Overview

Ollama runs open-weight LLMs on local hardware with a single binary and an OpenAI-compatible REST API. This guide installs Ollama, pulls a model, runs a chat, sets a system prompt, and shows two integration paths: the raw REST API and LangChain. For the trade-off against llama.cpp, see ollama-vs-llamacpp.

Prerequisites

macOS 13+ (Metal GPU), Linux with a GPU (NVIDIA or AMD with ROCm), or Linux CPU-only. Windows is supported via WSL2.
At least 8 GB RAM for 7B models; 16 GB for 13B; 32 GB for 30B+ models.
curl for API testing.
Python 3.11+ for the LangChain integration.

Steps

1. Install Ollama

macOS

brew install ollama

Or download the .dmg from ollama.com.

Linux

curl -fsSL https://ollama.com/install.sh | sh

This installs the binary and sets up a systemd service. Start it:

sudo systemctl start ollama
sudo systemctl enable ollama

Confirm Ollama is running:

ollama --version
curl http://localhost:11434/api/tags
# expected: {"models": [...]}

2. Pull a model

Choose a model based on available RAM:

Model	Size	Min RAM
`llama3.2:3b`	2.0 GB	8 GB
`llama3.1:8b`	4.7 GB	8 GB
`mistral:7b`	4.1 GB	8 GB
`qwen2.5:14b`	9.0 GB	16 GB

ollama pull llama3.2:3b

List installed models:

ollama list

3. Run an interactive chat

ollama run llama3.2:3b

Type at the >>> prompt. Press Ctrl+D to exit. Use /bye inside the REPL to quit cleanly.

4. Set a system prompt with a Modelfile

Create a custom model variant with a baked-in system prompt:

cat > Modelfile << 'EOF'
FROM llama3.2:3b
 
SYSTEM """
You are a concise technical writer. Reply in plain text only.
Lead with the rule, then the rationale.
Avoid em-dashes. Use periods or commas instead.
"""
EOF
 
ollama create mymodel -f Modelfile
ollama run mymodel

See system-prompts for rules on writing effective system prompts.

5. Use the REST API

Ollama exposes an OpenAI-compatible API at port 11434.

Non-streaming

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Explain HNSW indexing in two sentences.",
  "stream": false
}'

OpenAI-compatible /v1/chat/completions endpoint

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "What is RAG?"}]
  }'

The /v1 endpoint accepts the same payload as the OpenAI SDK. Swap the base URL to run any OpenAI SDK code against a local model.

6. Integrate with LangChain

pip install langchain langchain-ollama

from langchain_ollama import OllamaLLM
 
llm = OllamaLLM(model="llama3.2:3b", base_url="http://localhost:11434")
 
response = llm.invoke("List three Postgres performance tips.")
print(response)

For embeddings (useful in RAG pipelines):

from langchain_ollama import OllamaEmbeddings
 
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vector = embeddings.embed_query("What is RAG?")
print(len(vector))  # 768 for nomic-embed-text

See rag for the full retrieval pipeline that consumes these embeddings.

Verify it worked

# 1. Ollama server is up.
curl -s http://localhost:11434/api/tags | python3 -m json.tool | head -5
 
# 2. Model is installed.
ollama list | grep llama3.2
 
# 3. Single-shot inference works.
ollama run llama3.2:3b "Reply with the word OK only." --nowordwrap
 
# 4. REST API responds.
curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.2:3b","messages":[{"role":"user","content":"Say OK"}]}' | \
  python3 -c "import sys,json; print(json.load(sys.stdin)['choices'][0]['message']['content'])"

Common errors

connection refused on port 11434. The Ollama service is not running. Run ollama serve in a terminal or sudo systemctl start ollama.
pull model manifest: 404. The model name is wrong. Check ollama list and the Ollama model library.
Out-of-memory crash. The model does not fit in RAM or VRAM. Use a smaller quantized variant (e.g., llama3.1:8b-instruct-q4_0).
Slow responses on CPU. Without a GPU, 7B models run at 2 to 5 tokens per second. Use a 3B model for interactive speeds.
OllamaLLM not found in LangChain. Install langchain-ollama, not langchain-community. The community package moved Ollama to a dedicated package.

LLM Best Practices

Explorer

Overview

Prerequisites

Steps

1. Install Ollama

2. Pull a model

3. Run an interactive chat

4. Set a system prompt with a Modelfile

5. Use the REST API

6. Integrate with LangChain

Verify it worked

Common errors

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

How to set up Ollama locally

Overview

Prerequisites

Steps

1. Install Ollama

2. Pull a model

3. Run an interactive chat

4. Set a system prompt with a Modelfile

5. Use the REST API

6. Integrate with LangChain

Verify it worked

Common errors

Related

Graph View

Table of Contents

Backlinks