Metadata & Structured Data That LLMs Actually Read

Q: Does JSON-LD structured data help AI engines cite my page?

Barely. Large-scale measurement shows schema markup has close to zero effect on whether AI engines cite a page — within a few percent of no effect, sometimes negative. JSON-LD is still worth adding because it powers Google rich results, but it is not a lever for AI citations. Quotable content, visible dates, and entity authority are.

Q: What metadata do large language models actually read?

Models read the rendered HTML of the page far more than any single metadata field: the title, the headings, the body text, visible dates, tables and lists. Title tags and meta descriptions still shape how a page is understood and displayed, and visible publish/update dates support the freshness signal, but the content itself is what gets retrieved and quoted.

Q: What is llms.txt and do I need it?

llms.txt is a proposed plain-text file at your site root that describes your site and points to key resources for AI agents. It is a low-cost bonus that some third-party agents may use, but Google has stated you do not need special AI files to appear in generative search. Treat it as optional polish after the fundamentals, not as a requirement.

Few topics generate more bad advice than "metadata for AI." The pitch is seductive: add the right markup and the machines will cite you. The evidence says otherwise. Let's separate what models genuinely read from what merely feels productive.

The hierarchy of what gets read

Rank the inputs by how much they influence whether you are retrieved and quoted:

Signal	Read by models?	What it's really for
Rendered body content	Heavily	Retrieval and citation — the main event
Headings (H1–H3)	Yes	Structure, retrievable answer units
Visible dates on page	Yes	Freshness signal (~40% of citability)
Title tag	Yes	Topic understanding + SERP display
Meta description	Lightly	SERP snippet; weak ranking influence
JSON-LD structured data	Minimally for citations	Google rich results
llms.txt / AI files	Some third-party agents	Optional bonus, not required

The schema-for-citations myth

This is the correction that saves the most wasted effort. Adding JSON-LD does not meaningfully increase AI citations. Ahrefs tracked 1,885 pages that added schema between August 2025 and March 2026 and measured citations across AI Overviews, AI Mode and ChatGPT: the markup produced no meaningful uplift on any platform (Ahrefs, 2026).

There's a tempting counter-statistic: cited pages are almost 3× more likely to carry JSON-LD than non-cited pages. But that's correlation, not cause — authoritative pages tend to do everything well, schema included. The controlled before/after test is the one that settles it, and it found no lift.

So this does not mean "skip schema." It means deploy schema for the right reason: Google rich results. FAQ rich snippets, article cards, breadcrumb trails, product stars — those come from JSON-LD and are worth having. Just don't expect AI visibility to follow from markup. It follows from content.

The metadata that does pay off

Title tags

Still one of the highest-leverage fields. Front-load the primary term, keep it under ~60 characters, and write it as the answer to a query, not a brand slogan. Models use it to understand topic; humans use it to choose the click.

Visible dates

Freshness is roughly 40% of citability, and models read the rendered date, not just the JSON-LD field. Show "Updated June 2026" in the page body. A genuinely updated, dated page beats an undated one with identical facts.

Meta descriptions

A weak ranking signal but a real influence on click-through from classic SERPs. Write conversationally and accurately; don't keyword-stuff. Google often rewrites them anyway from the best-matching passage.

Open Graph & canonical

Open Graph controls how your link renders when shared — including inside some AI and social surfaces. A canonical tag prevents duplicate-URL dilution. Both are hygiene, not growth levers, but missing them creates avoidable problems.

How to implement JSON-LD correctly

If you do add structured data — and you should, for rich results — there is exactly one correct pattern: inline JSON-LD in the server-rendered HTML.

<script type="application/ld+json">
{ "@context": "https://schema.org", "@type": "Article", ... }
</script>

Common ways to get it wrong:

Injected after hydration (via client JS or useEffect) — invisible to the initial crawl.
Microdata on a collapsed accordion — the answer isn't in the server HTML, so Google sees a Question with no Answer and rejects the rich result.
Empty description — silently rejected.
Mixing microdata and JSON-LD on the same entity — contradictory signals.

Validate with the Google Rich Results Test and confirm the script appears in view source, not just the live DOM.

The AI-specific files (optional bonus)

Files like /llms.txt and /.well-known/mcp.json describe your site to third-party agents. Some may use them; Google has explicitly stated you do not need special AI files to appear in generative search. Treat them as cheap polish after the fundamentals, and use robots.txt with a Content-Signal line to govern how AI crawlers may use your content (search, retrieval, training).

Metadata is plumbing, not magic. Get the title, the visible date and the rendered content right, add schema for rich results, and stop chasing markup as a citation hack. The machine reads your words first.

Frequently asked questions

Does JSON-LD structured data help AI engines cite my page?

Barely. Measurement shows schema markup has near-zero effect on AI citations — within a few percent of nothing, sometimes negative. JSON-LD is still worth adding for Google rich results, but it is not a citation lever. Quotable content, visible dates and entity authority are.

What metadata do large language models actually read?

Models read the rendered HTML far more than any single field: title, headings, body, visible dates, tables and lists. Title tags and meta descriptions shape understanding and display, but the content itself is what gets retrieved and quoted.

What is llms.txt and do I need it?

It's a proposed plain-text file at your site root describing your site for AI agents. A low-cost bonus some third-party agents may use, but Google says you don't need special AI files for generative search. Optional polish after the fundamentals.

Metadata & structured data that LLMs actually read