Overview
A modern public site ships ten or so small files at well-known paths that crawlers, agents, and tooling expect to find without negotiation. Get them wrong and you lose indexing, AI-training opt-out, instant-crawl pings, search-engine verification, and accessibility metadata. This page is the master catalog: every file, its required location, its purpose, the rules for the body, and a link to the deep-dive. Use it as the pre-launch checklist and the post-launch audit list.
The catalog
| File | Path | Purpose | Required if |
|---|---|---|---|
robots.txt | /robots.txt | Crawl scope and sitemap pointer | Always |
sitemap.xml | /sitemap.xml | Enumerate every canonical URL with lastmod | Always |
llms.txt | /llms.txt | Agent-facing index of canonical pages | LLM/agent traffic matters |
llms-full.txt | /llms-full.txt | Full text dump of priority pages, for agent ingestion | Vault sites and reference docs |
ai.txt | /ai.txt | AI training opt-out declaration | You care about training-set inclusion |
| IndexNow key | /<32-char-key>.txt | Proves ownership for IndexNow pings | Using IndexNow on Bing or Yandex |
security.txt | /.well-known/security.txt | Vulnerability disclosure contact (RFC 9116) | Always for production sites |
humans.txt | /humans.txt | Credit the people behind the site | Optional, low-cost |
manifest.json | /manifest.json or /manifest.webmanifest | PWA install metadata, icons, theme | Mobile traffic matters |
favicon.ico | /favicon.ico | Browser tab icon, SERP favicon | Always |
| OG default image | /og-default.png (1200×630) | Fallback social card | Always |
| Apple touch icon | /apple-touch-icon.png (180×180) | iOS home-screen icon | iOS traffic matters |
Rules per file
robots.txt
Minimum body:
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
One Sitemap: line per sitemap. Only Disallow: paths that genuinely should not be crawled. Reference llms-txt from robots.txt is a common pattern: # llms.txt: https://example.com/llms.txt. Deep dive: technical.
sitemap.xml
Every canonical URL with a real <lastmod> from the page’s last_updated. Regenerate on build; never hand-edit. Split into multiple sitemaps with a sitemap index when you cross 50,000 URLs. Submit once in Google Search Console and Bing Webmaster Tools; engines refetch on their own. Pair with indexnow for sub-hour change notification.
llms.txt
Agent-facing markdown index of the site’s canonical pages, grouped by section, each with a one-sentence summary. Lives at /llms.txt, served as text/plain or text/markdown. Deep dive and authoring rules: llms-txt. End-to-end howto: ship-llms-txt. Auditing an existing one: audit-llms-txt-with-claude.
llms-full.txt
The full text content of the priority pages, concatenated, served at /llms-full.txt. Designed for agents that want to ingest the whole vault in one fetch rather than crawl page by page. Cap at ~2-3 MB to stay friendly to context windows. Generate at build from the same frontmatter that drives /llms.txt.
ai.txt
Declares the site’s stance on AI training-set inclusion. Spawning’s proposed format covers Disallow: and Allow: directives per crawler. Deep dive: ai-txt. Cross-link from robots.txt as a comment.
IndexNow key file
A flat text file at /<key>.txt whose body is the same 32-character key, proving ownership. Required before Bing or Yandex will accept IndexNow pings on the domain. Deep dive: indexnow.
security.txt
Per RFC 9116, place at /.well-known/security.txt. Required fields: Contact:, Expires:. Optional: Preferred-Languages:, Canonical:, Acknowledgments:. Refresh annually so Expires: stays in the future.
Contact: mailto:security@example.com
Expires: 2027-01-01T00:00:00Z
Preferred-Languages: en
humans.txt
Credit the people, tools, and stack behind the site. Free-form text. No standard schema; the convention is Team:, Tools:, Thanks: sections.
manifest.json
PWA metadata: app name, icons, theme color, display mode, start URL. Reference from HTML: <link rel="manifest" href="/manifest.webmanifest">. Validate with the Application panel in Chrome DevTools.
favicon.ico, OG image, Apple touch icon
Static binary assets at fixed paths. Reference each from the HTML <head>:
<link rel="icon" href="/favicon.ico">
<link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
<meta property="og:image" content="https://example.com/og-default.png">Per-page OG images override the default; see og-images for the dynamic-generation pattern.
Pre-launch checklist
Run this once before flipping DNS to production:
HOST="https://example.com"
for path in /robots.txt /sitemap.xml /llms.txt /llms-full.txt /ai.txt \
/.well-known/security.txt /humans.txt /manifest.webmanifest \
/favicon.ico /og-default.png /apple-touch-icon.png; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$HOST$path")
echo "$status $path"
doneEvery line should return 200. Investigate any non-200 and fix before the launch announcement. End-to-end build wiring for a Quartz site: static-site-seo.