Please install Yoast or RankMath to use breadcrumbs.

Content Licensing for AI Training: Your Rights, Your Rules

“Can I copy this?” is a question with a legal answer that’s been clear for decades — copyright law, fair use, click-through licenses. “Can I train an AI model on this?” is a question whose legal answer is still being written, court case by court case, jurisdiction by jurisdiction. Until that settles, the safest position is to declare your terms explicitly and enforce them where you can.

The four uses you need to think about

“Use” isn’t one thing. There are at least four distinct AI use-cases, each with its own rights implications:

  1. Indexing — the AI fetches your URL and adds it to a retrieval index. Doesn’t train on the content. Cites you when answering. This is what PerplexityBot does. Most publishers welcome it.
  2. Citation — the AI quotes excerpts from your content in its answer. Usually with a link. This is what ChatGPT Search does. Requires attribution to be meaningful.
  3. Training — the content is ingested into a model’s training corpus. The model “learns” from it and may reproduce ideas (occasionally verbatim) in future responses. This is what GPTBot does. Permanent and difficult to revoke.
  4. Commercial inference — a paid AI product uses the content to answer commercial queries on behalf of paying customers. Distinct from “free” inference. Falls into a grey legal area that’s actively being litigated.

You can have different policies for each use. AIOX’s licensing vocabulary makes this explicit.

Three layers of expression

You declare your terms in three places, each handling a different audience:

  • robots.txt — the legacy standard. AI crawlers that honour it (most do, for now) will respect Allow / Disallow per bot.
  • TDM-REP headers — the modern machine-readable opt-out. Required for EU AI Act compliance (Article 53). Distinguishes “may use for training” from “may use for inference”.
  • aiox:license field in Capsules — the granular layer. Per-bot, per-use, with optional licenseUrl pointing to commercial terms.

AIOX emits all three from a single configuration. You don’t maintain them separately.

What “license required” can look like

For commercial AI use you can charge. The pattern: set the license to “license-required” with a licenseUrl pointing to a price page. When Bot Sentinel sees a “license-required” bot requesting the content, it returns a 402 Payment Required with the licenseUrl in the Link header. The bot’s operator can hit the URL, pay, get a signed license token, and present it on subsequent requests.

Some publishers using this pattern report meaningful licensing revenue from AI labs that previously scraped freely. It works because the alternative (everyone blocking + scraping arms race) is worse for both sides.

Legal grounding

None of this is legal advice — talk to your lawyer about your specific jurisdiction. But the general posture is well-established:

  • EU: Article 53 of the EU AI Act requires opt-out compliance. TDM-REP is the W3C-recognised signal.
  • US: Fair use is being litigated. Several ongoing cases will set precedents. Explicit licensing positions strengthen your hand.
  • UK: The IPO has proposed an opt-out scheme similar to TDM-REP.

Most legal observers expect the compliance baseline to converge on machine-readable opt-outs being mandatory by 2027. Getting your licensing infrastructure in place now is both an SEO and a compliance win.