AI Crawler User-Agents

Overview

Match the bot to the behavior before you allow or block it. AI crawlers split into three jobs: training ingestion, search indexing for an AI answer engine, and live user-triggered fetches. Blocking one does not block the others, and the user-agent strings change as vendors split their fleets. This card lists the agents that matter as of 2026 and the exact robots.txt token for each. For where these tokens live and how they interact with llms.txt and ai.txt, see discoverability-files.

Know the three crawler jobs

Treat the job, not the brand, as the unit of control.

Training: ingests pages to train or fine-tune a model. Examples: GPTBot, Google-Extended, Applebot-Extended, CCBot, anthropic-ai.
Search indexing: builds the index an AI answer engine cites. Examples: OAI-SearchBot, Claude-SearchBot, PerplexityBot.
User fetch: retrieves a single page because a user asked a chatbot about it. Examples: ChatGPT-User, Claude-User, Perplexity-User. These honor robots.txt but are not bulk crawlers.

Reference table

User-agent	Operator	Job	Block in robots.txt
`GPTBot`	OpenAI	Training	`User-agent: GPTBot`
`OAI-SearchBot`	OpenAI	Search index	`User-agent: OAI-SearchBot`
`ChatGPT-User`	OpenAI	User fetch	`User-agent: ChatGPT-User`
`ClaudeBot`	Anthropic	Training	`User-agent: ClaudeBot`
`Claude-SearchBot`	Anthropic	Search index	`User-agent: Claude-SearchBot`
`Claude-User`	Anthropic	User fetch	`User-agent: Claude-User`
`Google-Extended`	Google	Training (Gemini)	`User-agent: Google-Extended`
`Googlebot`	Google	Search index	`User-agent: Googlebot`
`PerplexityBot`	Perplexity	Search index	`User-agent: PerplexityBot`
`Perplexity-User`	Perplexity	User fetch	`User-agent: Perplexity-User`
`Applebot-Extended`	Apple	Training	`User-agent: Applebot-Extended`
`Bytespider`	ByteDance	Training	`User-agent: Bytespider`
`CCBot`	Common Crawl	Training corpus	`User-agent: CCBot`
`Amazonbot`	Amazon	Search and assistant	`User-agent: Amazonbot`
`Meta-ExternalAgent`	Meta	Training	`User-agent: Meta-ExternalAgent`
`cohere-ai`	Cohere	Training and inference	`User-agent: cohere-ai`
`MistralAI-User`	Mistral	User fetch	`User-agent: MistralAI-User`
`Diffbot`	Diffbot	Knowledge graph	`User-agent: Diffbot`

Allow or block with full rules, not a bare token

Set the policy per agent, then state the directive. A User-agent line with no Disallow or Allow is ambiguous; always pair them.

# Allow everything (this site's posture)
User-agent: GPTBot
Allow: /

# Block training but keep search-index access
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

Blocking GPTBot removes your pages from training but leaves ChatGPT-User and OAI-SearchBot free to fetch and cite you. To stay out of AI answers entirely, block the search-index and user-fetch agents too.

Watch for fleet splits and stale tokens

Vendors rename and split agents; an allowlist written for last year leaks. Anthropic retired Claude-Web in favor of Claude-User and Claude-SearchBot; Google separated Google-Extended (training) from Googlebot (search). Re-check operator docs each quarter, and prefer an explicit allow for the agents you want over a single wildcard. This site allows every agent because each page is built to be cited; see for-ai-agents and the curated index at llms-txt. For crawl-volume tradeoffs on large sites, see crawl-budget and ai-search-optimization.

LLM Best Practices

Explorer

Overview

Know the three crawler jobs

Reference table

Allow or block with full rules, not a bare token

Watch for fleet splits and stale tokens

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

AI Crawler User-Agents

Overview

Know the three crawler jobs

Reference table

Allow or block with full rules, not a bare token

Watch for fleet splits and stale tokens

Related

Graph View

Table of Contents

Backlinks