Few topics generate more bad advice than "metadata for AI." The pitch is seductive: add the right markup and the machines will cite you. The evidence says otherwise. Let's separate what models genuinely read from what merely feels productive.

The hierarchy of what gets read

Rank the inputs by how much they influence whether you are retrieved and quoted:

SignalRead by models?What it's really for
Rendered body contentHeavilyRetrieval and citation — the main event
Headings (H1–H3)YesStructure, retrievable answer units
Visible dates on pageYesFreshness signal (~40% of citability)
Title tagYesTopic understanding + SERP display
Meta descriptionLightlySERP snippet; weak ranking influence
JSON-LD structured dataMinimally for citationsGoogle rich results
llms.txt / AI filesSome third-party agentsOptional bonus, not required

The schema-for-citations myth

This is the correction that saves the most wasted effort. Adding JSON-LD does not meaningfully increase AI citations. Ahrefs tracked 1,885 pages that added schema between August 2025 and March 2026 and measured citations across AI Overviews, AI Mode and ChatGPT: the markup produced no meaningful uplift on any platform (Ahrefs, 2026).

There's a tempting counter-statistic: cited pages are almost more likely to carry JSON-LD than non-cited pages. But that's correlation, not cause — authoritative pages tend to do everything well, schema included. The controlled before/after test is the one that settles it, and it found no lift.

So this does not mean "skip schema." It means deploy schema for the right reason: Google rich results. FAQ rich snippets, article cards, breadcrumb trails, product stars — those come from JSON-LD and are worth having. Just don't expect AI visibility to follow from markup. It follows from content.

The metadata that does pay off

Title tags

Still one of the highest-leverage fields. Front-load the primary term, keep it under ~60 characters, and write it as the answer to a query, not a brand slogan. Models use it to understand topic; humans use it to choose the click.

Visible dates

Freshness is roughly 40% of citability, and models read the rendered date, not just the JSON-LD field. Show "Updated June 2026" in the page body. A genuinely updated, dated page beats an undated one with identical facts.

Meta descriptions

A weak ranking signal but a real influence on click-through from classic SERPs. Write conversationally and accurately; don't keyword-stuff. Google often rewrites them anyway from the best-matching passage.

Open Graph & canonical

Open Graph controls how your link renders when shared — including inside some AI and social surfaces. A canonical tag prevents duplicate-URL dilution. Both are hygiene, not growth levers, but missing them creates avoidable problems.

How to implement JSON-LD correctly

If you do add structured data — and you should, for rich results — there is exactly one correct pattern: inline JSON-LD in the server-rendered HTML.

<script type="application/ld+json">
{ "@context": "https://schema.org", "@type": "Article", ... }
</script>

Common ways to get it wrong:

  • Injected after hydration (via client JS or useEffect) — invisible to the initial crawl.
  • Microdata on a collapsed accordion — the answer isn't in the server HTML, so Google sees a Question with no Answer and rejects the rich result.
  • Empty description — silently rejected.
  • Mixing microdata and JSON-LD on the same entity — contradictory signals.

Validate with the Google Rich Results Test and confirm the script appears in view source, not just the live DOM.

The AI-specific files (optional bonus)

Files like /llms.txt and /.well-known/mcp.json describe your site to third-party agents. Some may use them; Google has explicitly stated you do not need special AI files to appear in generative search. Treat them as cheap polish after the fundamentals, and use robots.txt with a Content-Signal line to govern how AI crawlers may use your content (search, retrieval, training).

Metadata is plumbing, not magic. Get the title, the visible date and the rendered content right, add schema for rich results, and stop chasing markup as a citation hack. The machine reads your words first.

Frequently asked questions

Does JSON-LD structured data help AI engines cite my page?

Barely. Measurement shows schema markup has near-zero effect on AI citations — within a few percent of nothing, sometimes negative. JSON-LD is still worth adding for Google rich results, but it is not a citation lever. Quotable content, visible dates and entity authority are.

What metadata do large language models actually read?

Models read the rendered HTML far more than any single field: title, headings, body, visible dates, tables and lists. Title tags and meta descriptions shape understanding and display, but the content itself is what gets retrieved and quoted.

What is llms.txt and do I need it?

It's a proposed plain-text file at your site root describing your site for AI agents. A low-cost bonus some third-party agents may use, but Google says you don't need special AI files for generative search. Optional polish after the fundamentals.