Overview
LLM agents read sites differently from human users. They prefer one short summary over five paragraphs, raw markdown over hydrated HTML, and named rules over vibes. This guide retrofits an existing content site so agents can find pages quickly, parse them cheaply, and cite them back accurately. The agent-facing standards (/llms.txt, /ai.txt) live alongside classic SEO; see llms-txt and ai-txt for the standards themselves.
Prerequisites
- A content site with stable URLs. Agents cite URLs; do not change them after publish.
- Pages with YAML frontmatter containing at minimum
title,summary, andcategory. The summary is the agent’s routing hint. - A build pipeline that can emit files at the site root. Static hosts do this by default.
Steps
1. Ship /llms.txt at the site root
/llms.txt is a hand-curated index for agents. One markdown file, grouped by H2 category, with one bullet per page in the form - [Title](url): one-line summary. See ship-llms-txt for the build step.
# Site Name
> One-paragraph blockquote describing what the site is and who it serves.
## Category
- [Page Title](https://example.com/category/slug): One-sentence summary.
Generate the file from frontmatter at build time so it cannot drift from the rest of the site.
2. Ship /ai.txt with crawl preferences
/ai.txt is the AI-crawler counterpart to robots.txt. State which agents may train on the content and which may not.
# ai.txt
User-Agent: *
Allow: /
User-Agent: GPTBot
Disallow: /
User-Agent: Bytespider
Disallow: /
Put real preferences here, not vague ones. See ai-txt for the spec.
3. Add JSON-LD on every page
Search and agent crawlers both read JSON-LD. Pick one entity type per page kind (Article, BlogPosting, HowTo, FAQPage) and generate the block from frontmatter. See add-jsonld-to-static-site for the procedure.
4. Ship OG cards on every page
og:title, og:description, og:image, and og:url. Many agent tools render an OG preview when a URL is shared; missing OG tags reduce the visible signal. See og-images.
<meta property="og:title" content="How to ship llms.txt" />
<meta property="og:description" content="Decide structure, write summaries, place at site root." />
<meta property="og:image" content="https://example.com/og/howto-ship-llms-txt.png" />
<meta property="og:url" content="https://example.com/howto/ship-llms-txt" />5. Expose raw markdown for every page
Agents read markdown cheaper than they read HTML. If the site is on GitHub Pages or Cloudflare Pages, the raw markdown is already addressable via raw.githubusercontent.com. Add a small “View raw” link to each page footer, or document the pattern in meta/for-ai-agents.
View raw markdown: https://raw.githubusercontent.com/you/site/main/content/howto/ship-llms-txt.mdFor sites without a public repo, emit a <slug>.md file next to every <slug>.html at build time.
6. Name the heuristics
Agents cite named rules better than they cite unnamed prose. Give the rule a name, then state it.
## The 5/3/1 rule for wikilink density
- 5 wikilinks per page minimum.
- 3 tags per page minimum.
- 1 H1 per page; never two.A named rule is greppable; a paragraph of advice is not. See for-ai-agents for the patterns this site uses.
7. Write atomically and link aggressively
Keep each page to 400 to 700 words. Split when a section grows past that and add a wikilink to the new page. Agents follow one-hop links cheaply; they re-read long pages expensively.
Verify it worked
Three machine checks, then one model check.
# 1. The agent files are served.
curl -sI https://yourdomain.com/llms.txt | head -1
curl -sI https://yourdomain.com/ai.txt | head -1
# expected: HTTP/2 200
# 2. JSON-LD is in <head> on a sample page.
curl -s https://yourdomain.com/foo | grep -o 'application/ld+json'
# 3. Raw markdown resolves.
curl -sI https://raw.githubusercontent.com/you/site/main/content/foo.md | head -1Model check: paste the URL into Claude, ChatGPT, or Gemini and ask “summarize this page in two sentences and quote the named rule.” If the summary matches the page summary and the named rule comes back verbatim, the content is agent-ready.
Common errors
/llms.txtlists the same page twice across categories. Pick one home per page.- Summaries balloon past one sentence. The summary is a routing hint, not the page.
- Pages over 1,500 words. Split into atomic pages with wikilinks; agents will follow.
- No raw markdown source. Agents have to parse HTML, which costs tokens and drops fidelity.
- Banned-vocab slop (
"delve,""navigate the landscape of,""unlock the power of"). Agents trained on web text downweight pages that read like AI marketing copy.