Definition

Content-Signal is an emerging robots.txt directive that lets a site separate three permissions: search indexing, AI retrieval (RAG), and AI training. It gives publishers granular control over how AI may use their content.

Content-Signal is a directive added to robots.txt that expresses, in one line, how automated systems may use your content across three distinct purposes:

Content-Signal: search=yes, ai-input=yes, ai-train=no
  • search — allow classic indexing (Google, Bing)
  • ai-input — allow retrieval for live AI answers (RAG)
  • ai-train — allow use of content for model training

Recommended posture

For most sites, search=yes, ai-input=yes, ai-train=no is the pragmatic default: you want to be found and cited, without feeding proprietary training. It cleanly separates "use my content to answer now" from "use my content to build your model."

Frequently asked questions

What does ai-input mean in Content-Signal?

ai-input governs retrieval-augmented generation: whether AI systems may fetch your page to ground a live answer for a user. Setting it to yes keeps you eligible to be cited in AI answers.

Can Content-Signal stop AI training on my content?

It signals your preference with ai-train=no, and compliant crawlers honor it. Like the rest of robots.txt, it is a declared policy rather than a hard technical block, so enforcement depends on the crawler.