Structured data for AI crawlers

Overview

Structured data for AI crawlers is JSON-LD that tells a model what a page is and which real-world entities it covers, so the crawler parses it as data rather than guessing from prose. The schemas that pull weight are TechArticle (or Article), BreadcrumbList, and DefinedTerm, each grounded with sameAs. Markup is a parsing aid, not a ranking trick: a crawler trusts it only when the visible content backs it. This page sits under llm-seo-best-practices; for the full schema catalog see structured-data and schema-markup-deep.

Ship Article or TechArticle on every content page

Give each reference page an Article or TechArticle block with headline, description, datePublished, dateModified, author, and publisher. This lets a crawler attribute the claim, date it, and resolve the author entity. Date fields matter most for fast-moving topics, where a model should prefer the fresher source.

{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Structured data for AI crawlers",
  "description": "Which JSON-LD schemas help AI crawlers parse and cite a page.",
  "datePublished": "2026-06-07",
  "dateModified": "2026-06-07",
  "author": { "@type": "Person", "name": "Clem", "sameAs": ["https://github.com/AXIA-Enterprises"] },
  "publisher": { "@type": "Organization", "name": "LLM Best Practices" }
}

Ground entities with sameAs

A crawler disambiguates “Astro” the framework from “Astro” the product by following sameAs to a Wikipedia or Wikidata URL. Mark the primary entity with DefinedTerm or an about array of Thing objects, each carrying sameAs. Entity grounding is what maps your page to the correct concept in the model’s world. See llm-seo-best-practices.

Use BreadcrumbList for position

Emit BreadcrumbList so the crawler reads the page’s place in the site hierarchy. It reinforces the topical cluster the page belongs to and gives the engine a clean trail back to the pillar.

Reserve FAQPage for genuine Q&A

Use FAQPage only where the visible page is actually questions and answers. Google removed FAQ rich results for most sites, but the markup still helps a crawler segment question-answer pairs. On a page that is not Q&A, FAQ markup is noise the classifier discounts. See schema-markup-deep.

Validate, then trust nothing unbacked

Validate every block with the schema.org validator and the Google Rich Results Test. The hard rule: markup the prose does not support earns nothing and can erode trust. Keep JSON-LD and visible content in sync. See add-jsonld-to-static-site and the crawler reference in ai-crawlers.

Pitfalls

Hand-maintaining JSON-LD per page; generate it from frontmatter so it cannot drift from the content.
Stuffing keywords into keywords or inventing entities with no sameAs. Unresolvable entities are ignored.
Confusing schema markup with structured-output from an LLM; one describes a web page, the other constrains a model response.

LLM Best Practices

Explorer

Structured data for AI crawlers

Overview

Ship Article or TechArticle on every content page

Ground entities with sameAs

Use BreadcrumbList for position

Reserve FAQPage for genuine Q&A

Validate, then trust nothing unbacked

Pitfalls

Graph View

Table of Contents

Backlinks

LLM Best Practices

Explorer

Structured data for AI crawlers

Overview

Ship Article or TechArticle on every content page

Ground entities with sameAs

Use BreadcrumbList for position

Reserve FAQPage for genuine Q&A

Validate, then trust nothing unbacked

Pitfalls

Related

Graph View

Table of Contents

Backlinks