How to configure pgvector on Neon

Overview

Neon is a serverless Postgres platform with native pgvector support. This guide enables the extension, creates a table to store embeddings, inserts vectors, builds an HNSW index for fast approximate nearest-neighbor search, and runs a cosine-similarity query. For the trade-off between pgvector and a dedicated vector database, see pinecone-vs-pgvector.

Prerequisites

A Neon account. Free tier provides one project with up to 3 GB storage.
psql installed locally or the Neon SQL editor in the browser.
Python 3.11+ and psycopg2 or asyncpg if you plan to insert via code.
An embedding model that produces fixed-length float vectors (e.g., text-embedding-3-small at 1536 dimensions). See embeddings for model choice guidance.
Your DATABASE_URL from the Neon project dashboard.

Steps

1. Enable the pgvector extension

Connect to your Neon database and run:

CREATE EXTENSION IF NOT EXISTS vector;

Confirm it is active:

SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
-- expected: vector | 0.7.0 (or later)

pgvector is pre-installed on all Neon projects; you only need to activate it per database.

2. Create the embeddings table

CREATE TABLE documents (
  id        BIGSERIAL PRIMARY KEY,
  content   TEXT        NOT NULL,
  metadata  JSONB,
  embedding VECTOR(1536) NOT NULL
);

Adjust the dimension (1536) to match your embedding model. Dimension mismatches at insert time raise a different vector dimensions error. See embedding for the dimension-model mapping.

3. Insert embeddings

Use the Python psycopg2 driver with the pgvector adapter:

import os
import psycopg2
from pgvector.psycopg2 import register_vector
 
conn = psycopg2.connect(os.environ["DATABASE_URL"])
register_vector(conn)
 
cur = conn.cursor()
 
# embedding_vector is a list of 1536 floats from your model.
cur.execute(
    "INSERT INTO documents (content, metadata, embedding) VALUES (%s, %s, %s)",
    ("The quick brown fox", {"source": "test"}, embedding_vector),
)
conn.commit()

For bulk inserts, use executemany or COPY to avoid per-row round-trips. See rag-vector-databases for batching strategies.

4. Build an HNSW index

A sequential scan (the default) is accurate but slow at scale. HNSW (Hierarchical Navigable Small World) gives approximate nearest-neighbor search in sub-linear time.

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Parameter guidance:

m = 16: number of connections per node. Higher improves recall, increases index size.
ef_construction = 64: build-time search depth. Higher improves recall, slows build.
Use vector_cosine_ops for cosine similarity (the standard for normalized embeddings from OpenAI and Voyage). Use vector_l2_ops for Euclidean distance.

See postgres-indexes for the general index-selection playbook.

5. Query by similarity

-- Retrieve the 5 documents closest to a query embedding.
SELECT
  id,
  content,
  metadata,
  1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS cosine_similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 5;

The <=> operator is cosine distance (1 minus similarity). Use <-> for Euclidean distance and <#> for negative inner product (for dot-product similarity with normalized vectors).

In Python:

query_vec = get_embedding("What does the fox do?")
 
cur.execute(
    "SELECT id, content FROM documents ORDER BY embedding <=> %s LIMIT 5",
    (query_vec,),
)
results = cur.fetchall()

6. Set `ef_search` at query time for precision tuning

SET hnsw.ef_search = 100;  -- Default is 40; higher improves recall at query cost.

Set this per session or per query. For RAG pipelines where recall matters more than latency, start at 100 and tune down until latency meets the SLA.

Verify it worked

-- 1. Extension is active.
SELECT extname FROM pg_extension WHERE extname = 'vector';
 
-- 2. Table exists with the correct column type.
\d documents
 
-- 3. HNSW index is on the embedding column.
SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'documents';
 
-- 4. Similarity query returns rows.
SELECT id, content FROM documents ORDER BY embedding <=> (SELECT embedding FROM documents LIMIT 1) LIMIT 5;

Common errors

type "vector" does not exist. The extension was enabled in a different database. Run CREATE EXTENSION IF NOT EXISTS vector in the correct database.
different vector dimensions (X) != (Y). The inserted vector length does not match the column definition. Confirm the embedding model dimension and update the column definition if needed.
HNSW index not used by the query planner. Run EXPLAIN (ANALYZE, BUFFERS) and confirm Index Scan using ... on documents. If a Seq Scan appears, the table may be too small for the planner to prefer the index, or ef_search is set too high.
Insert is very slow on Neon. Neon auto-suspends the compute after five minutes of inactivity. The first connection after a suspend incurs a cold-start delay of one to three seconds.

LLM Best Practices

Explorer

How to configure pgvector on Neon

Overview

Prerequisites

Steps

1. Enable the pgvector extension

2. Create the embeddings table

3. Insert embeddings

4. Build an HNSW index

5. Query by similarity

6. Set `ef_search` at query time for precision tuning

Verify it worked

Common errors

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

How to configure pgvector on Neon

Overview

Prerequisites

Steps

1. Enable the pgvector extension

2. Create the embeddings table

3. Insert embeddings

4. Build an HNSW index

5. Query by similarity

6. Set ef_search at query time for precision tuning

Verify it worked

Common errors

Related

Graph View

Table of Contents

Backlinks

6. Set `ef_search` at query time for precision tuning