Definition

An AI crawler is a bot run by an AI company to fetch web content for model training or live retrieval. Examples include GPTBot, ClaudeBot, PerplexityBot and Google-Extended. You control their access via robots.txt.

An AI crawler is an automated agent operated by an AI provider that fetches web pages either to train models or to retrieve content for live answers. They are distinct from classic search crawlers, though some companies run both.

Common AI crawlers (2026)

  • GPTBot (OpenAI) — training; OAI-SearchBot / ChatGPT-User — search & browsing
  • ClaudeBot (Anthropic) — training; Claude-SearchBot — retrieval
  • PerplexityBot — retrieval
  • Google-Extended — Gemini training (separate from Googlebot)

Controlling access

Govern them with robots.txt rules and the Content-Signal directive to separate search indexing, live retrieval and training permissions.

Frequently asked questions

Should I block AI crawlers?

It depends on your goal. Blocking training crawlers (e.g. GPTBot, Google-Extended) while allowing retrieval crawlers keeps you citable in AI answers without feeding training. Blocking everything risks losing AI visibility entirely.

Is Google-Extended the same as Googlebot?

No. Googlebot handles search indexing; Google-Extended is a separate control for whether your content trains Gemini. You can allow one and disallow the other.