Overview
ai.txt is a proposed policy file for declaring how AI systems may train on, access, or summarize a site’s content. Treat it as the AI-era counterpart to robots.txt. The proposal is not a ratified standard; adoption is partial, but the cost to ship one is low and the signal it sends to AI vendors is clear.
ai.txt is a policy file, not an index
The distinction matters. Two files, two jobs, both served at the site root.
ai.txtsays what AI systems may and may not do with the content. It is policy.- llms-txt says what is worth reading and how to find it. It is a curated index.
robots.txtsays what general-purpose crawlers may fetch. It is unrelated to AI training.
Ship all three on any site that expects agent traffic. They do not overlap and do not substitute for each other.
Serve ai.txt at the site root with text/plain
The file lives at https://<your-domain>/ai.txt and serves with Content-Type: text/plain. Static hosts (GitHub Pages, Vercel, Netlify, Cloudflare Pages) serve it directly when placed at the build root. The path is fixed; do not nest it under /.well-known/ or any other prefix.
Use line-based directives, one per line
The format mirrors robots.txt: user-agent blocks followed by directives. The proposal defines a small directive set; agents that respect the file read the directives that apply to their declared user-agent and fall back to *.
Common directives, with the meaning each conveys:
User-Agent:The agent or vendor the block applies to.*matches all.Disallow:Path prefix the agent must not access for the declared purpose.Allow:Path prefix that overrides a broaderDisallow.Train:Whether the agent may use the content for model training.yesorno.Quote:Whether the agent may quote the content in responses.yesorno.Summarize:Whether the agent may summarize the content.yesorno.Contact:An email or URL for licensing or escalation.
Example file
# https://example.com/ai.txt
# Policy for AI systems. Last updated 2026-05-14.
User-Agent: *
Train: no
Summarize: yes
Quote: yes
Contact: ai-policy@example.com
User-Agent: GPTBot
Train: no
Disallow: /paid/
Disallow: /members/
User-Agent: ClaudeBot
Train: no
Allow: /blog/
Disallow: /paid/
User-Agent: Google-Extended
Train: no
The intent here: anyone may summarize and quote public pages, no one may train on them, and two specific bots are additionally restricted from paid sections. Google-Extended is Google’s training-only signal; including it covers Bard and Vertex training even when Googlebot itself is allowed for search.
Pair ai.txt with robots.txt and meta tags, never alone
ai.txt is advisory. Compliant vendors honor it; non-compliant ones ignore it. Defense in depth:
- Mirror critical restrictions in
robots.txtfor the bot’s documented crawler name (GPTBot,ClaudeBot,CCBot,Google-Extended,PerplexityBot). - Add
<meta name="robots" content="noai, noimageai">on pages that must not be ingested at all. - Enforce paid or private content at the application layer; do not rely on policy files alone.
For the index counterpart that helps compliant agents find your good content, see llms-txt. For broader agent guidance on this site, see for-ai-agents.
Validate the file as part of the build
Lint at build time. Minimum checks:
- File exists at the build root.
- Each non-comment line matches a known directive or
User-Agent:header. - Path prefixes start with
/. Train,Quote,Summarizevalues areyesorno.
There is no official validator. A short regex pass and a deploy-time HEAD check are enough.
Treat the proposal as moving ground
The spec is in flux. The directive set above reflects the most widely adopted variant as of this page’s date. Re-check the proposal source annually and bump last_updated when the file’s semantics change. Adoption tracks vendor policy, not the spec; the practical signal is whichever directive each major lab publicly honors.