Robots.txt was first proposed in 1994. It survived for 30 years almost untouched. It’s now obsolete for the questions AI raises — but it’s not going away, so you need to update your strategy rather than replace it.
The file has exactly two directives: Allow and Disallow. Per user-agent. That’s it. Robots.txt can express:
Robots.txt cannot express:
The richer semantics now live in companion specifications: TDM-REP (machine-readable opt-outs), the X-Robots-Tag HTTP header, the aiox:license field inside Capsules. Together they form the modern stack.
A reasonable baseline for a publisher who wants citations but not training:
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: PerplexityBot
Allow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
This blocks training-focused crawlers, allows retrieval-focused ones, and falls through to “allow” for the default user-agent so Googlebot and other classic crawlers stay welcome.
That’s why the modern approach pairs robots.txt with: (a) TDM-REP headers for legal opt-out signalling, (b) AIOX Capsule license fields for granular declarations, and (c) actual enforcement at the bot-management layer (which is what Bot Sentinel does).
You configure your licensing posture once, in the Content Licensing app. AIOX emits all three layers from that single config:
And the Robots.txt Studio gives you a visual editor with a live bot-tester so you can ask “what would GPTBot see at /pricing/?” before you ship a rule change.