Please install Yoast or RankMath to use breadcrumbs.

Your Robots.txt Strategy Needs an AI-Era Update

Robots.txt was first proposed in 1994. It survived for 30 years almost untouched. It’s now obsolete for the questions AI raises — but it’s not going away, so you need to update your strategy rather than replace it.

What robots.txt can and can’t say

The file has exactly two directives: Allow and Disallow. Per user-agent. That’s it. Robots.txt can express:

  • “GPTBot is allowed to crawl /blog/.”
  • “ClaudeBot is disallowed from /pricing/.”

Robots.txt cannot express:

  • “GPTBot may index this content but not use it for training.”
  • “PerplexityBot may use the content but must cite us.”
  • “ChatGPT-User may quote up to 200 characters.”
  • “Commercial AI may use this content only after purchasing a license at this URL.”

The richer semantics now live in companion specifications: TDM-REP (machine-readable opt-outs), the X-Robots-Tag HTTP header, the aiox:license field inside Capsules. Together they form the modern stack.

What your robots.txt should look like in 2026

A reasonable baseline for a publisher who wants citations but not training:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Allow: /

User-agent: Bytespider
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

This blocks training-focused crawlers, allows retrieval-focused ones, and falls through to “allow” for the default user-agent so Googlebot and other classic crawlers stay welcome.

Why static robots.txt isn’t enough

  • It’s text-only. It can’t say “license required” with a price.
  • It’s not signed. Anyone can spoof the user-agent and ignore the rules.
  • It’s not enforced. It’s purely advisory — well-behaved crawlers honour it, scrapers ignore it.
  • New AI products launch constantly with new user-agents you may not know about.

That’s why the modern approach pairs robots.txt with: (a) TDM-REP headers for legal opt-out signalling, (b) AIOX Capsule license fields for granular declarations, and (c) actual enforcement at the bot-management layer (which is what Bot Sentinel does).

How AIOX handles it

You configure your licensing posture once, in the Content Licensing app. AIOX emits all three layers from that single config:

  1. The robots.txt directives.
  2. The TDM-REP headers on every response.
  3. The aiox:license field inside every Capsule.

And the Robots.txt Studio gives you a visual editor with a live bot-tester so you can ask “what would GPTBot see at /pricing/?” before you ship a rule change.