Overview

/llms.txt is a single markdown file at the site root that gives LLM agents a curated, descriptive index of the site. This guide ships a working /llms.txt end to end: decide the structure, write the summaries, place the file at the build root, link it from robots.txt and ai.txt, and verify with curl. The standard itself is documented in llms-txt.

Prerequisites

  • A site with stable URLs. The file lists URLs; they should not move.
  • Pages with frontmatter that includes title, summary, and category. The summary becomes the per-link description.
  • A build pipeline that can emit a file at the site root. Quartz copies any markdown at content/ to the build root; Astro and Next put it under public/.

Steps

1. Decide the structure

/llms.txt is one H1 for the site title, one blockquote for the site summary, then H2 sections per category with one bullet per page. Skip H3 unless the site has more than a few hundred pages.

# Site Name

> One- or two-paragraph blockquote describing what the site is and who it is for.

## Category

- [Page Title](https://example.com/category/slug): One-sentence summary.
- [Another Page](https://example.com/category/other): One-sentence summary.

## Another Category

- [Page](https://example.com/other/page): Summary.

Use absolute URLs. Relative paths break when an agent fetches the file in isolation.

2. Write the site summary

The blockquote after the H1 is the routing hint for agents. Keep it to two sentences. Name what the site is, who it serves, and what shape the content takes.

> Opinionated reference vault of best practices for dev, writing, and AI agent work. Pages are atomic, YAML-fronted, and stable enough to cite in agent prompts.

Do not market. Do not hedge. State what the site is.

3. Generate from frontmatter at build time

Hand-maintaining the file invites drift. Generate it from the same frontmatter the rest of the site uses. The skeleton, in Node:

// scripts/build-llms-txt.mjs
import fs from "node:fs/promises"
import { walkContent, readFrontmatter } from "./util.mjs"
 
const SITE = { title: "Site Name", summary: "One sentence." }
const BASE = "https://example.com"
 
const pages = await walkContent("content")
const byCat = new Map()
for (const p of pages) {
  const fm = await readFrontmatter(p)
  if (fm.status === "draft") continue
  const cat = fm.category ?? "uncategorized"
  if (!byCat.has(cat)) byCat.set(cat, [])
  byCat.get(cat).push({ title: fm.title, slug: fm.slug, summary: fm.summary, cat })
}
 
const out = [`# ${SITE.title}`, "", `> ${SITE.summary}`, ""]
for (const [cat, items] of [...byCat.entries()].sort()) {
  out.push(`## ${cat[0].toUpperCase() + cat.slice(1)}`, "")
  for (const p of items.sort((a, b) => a.title.localeCompare(b.title))) {
    out.push(`- [${p.title}](${BASE}/${cat}/${p.slug}): ${p.summary}`)
  }
  out.push("")
}
await fs.writeFile("content/llms.txt", out.join("\n"))

Wire this into prebuild so it runs before the site build.

4. Place the file at the site root

The file must serve from https://yourdomain.com/llms.txt, not from a subpath. Each SSG has a different mechanism.

  • Quartz: drop content/llms.txt and it ships to public/llms.txt. See quartz.
  • Astro: put it in public/llms.txt.
  • Next.js: put it in public/llms.txt or write an API route.
  • Hugo: put it in static/llms.txt.

Serve it with Content-Type: text/plain or text/markdown. Most static hosts pick this up from the file extension.

Discovery improves when the file is referenced where crawlers already look.

# robots.txt
Sitemap: https://example.com/sitemap.xml
# llms.txt
# llms.txt index for LLM agents: https://example.com/llms.txt
# ai.txt
User-Agent: *
Allow: /
# llms.txt index: https://example.com/llms.txt

The # llms.txt: comment is informational; some tools parse it. See ai-txt.

6. Consider /llms-full.txt

/llms-full.txt is an optional companion: the same index with full page bodies inlined. Ship it when the total content fits in a single context window (rule of thumb: under 500K tokens). For larger sites, omit it.

Verify it worked

Four checks.

# 1. The file is served at the root.
curl -sI https://yourdomain.com/llms.txt | head -1
# expected: HTTP/2 200
 
# 2. It starts with `# ` and contains a blockquote.
curl -s https://yourdomain.com/llms.txt | head -3
# expected: line 1 starts with "# ", line 3 starts with "> "
 
# 3. Every URL returns 200. Spot-check three.
curl -s https://yourdomain.com/llms.txt | \
  grep -oE 'https?://[^)]+' | sort -u | head -3 | \
  xargs -I {} curl -sI {} -o /dev/null -w "%{http_code} {}\n"
 
# 4. No duplicate URLs.
curl -s https://yourdomain.com/llms.txt | grep -oE 'https?://[^)]+' | sort | uniq -d
# expected: empty output

Common errors

  • File ends up at /static/llms.txt instead of /llms.txt. Move it to the build root.
  • Mixed relative and absolute URLs. Pick absolute and regenerate.
  • Summary line longer than one sentence. The summary is a routing hint; trim it.
  • Drafts and deprecated pages leak in. The generator must filter on status.
  • File regenerated by hand and forgotten on the next content change. Wire the generator into prebuild so the file is never stale.