← Tech Blog "Block the Wrong Bot and You're Deleting Yourself From AI Answers"
3 min read418 reads

"Block the Wrong Bot and You're Deleting Yourself From AI Answers"

#GEO #AI crawler #robots.txt #GPTBot #AI visibility

The most expensive misconception: “blocking AI” isn’t one switch

“I don’t want AI taking my content to train on” — a reasonable instinct, so many people go into robots.txt and block every AI-related crawler. The problem: a single AI company often sends more than one crawler, each doing completely different things. Block them all and you usually also cut the one that gets you cited.

The result is the most ironic kind of failure: you think you’re only refusing training, but you’re actually deleting yourself from AI’s answers — and because it’s invisible, you have no idea.

Training bots and search bots are different crawlers

Most mainstream AI companies split their crawlers into two purposes: one fetches content to train models, the other fetches in real time when a user asks a question, then cites you. Blocking the former doesn’t hurt visibility; blocking the latter means leaving that engine’s answers.

AI company Training (low impact if blocked) Search / live citation (you disappear if blocked)
OpenAI GPTBot OAI-SearchBot, ChatGPT-User
Anthropic ClaudeBot, anthropic-ai Claude-User, Claude-SearchBot
Perplexity PerplexityBot Perplexity-User
Google Google-Extended (opt out of Gemini training) Googlebot (block it and you lose Search too)
Apple Applebot-Extended Applebot

The point isn’t to memorize this table — it’s to understand that “block training” and “keep visibility” can both be true at once, as long as you can tell which bot is which.

The cost of getting it wrong: vanishing from AI answers

In traditional SEO, blocking the wrong crawler at least shows up as a ranking drop you can notice. But AI citation is invisible: a user asks, AI answers, you’re not in it — no notification, no “impression that didn’t happen” in any dashboard.

That’s what makes blocking the wrong AI bot so dangerous: there’s no alarm. By the time you notice “competitors get mentioned in ChatGPT and I don’t,” you’ve usually been missing out for a long time.

noai and robots.txt are not the same thing

A common confusion worth clearing up: the noai / noimageai meta tags on a page, and robots.txt crawler rules, are two different mechanisms. The former asks “don’t train on this page”; the latter controls “which crawler may fetch which paths.” Both rely on crawlers honoring them voluntarily, neither is an enforceable standard, and both can hurt your visibility if set too bluntly.

So how should you set it

In one line: block training, allow search.

For each vendor’s full crawler list and rule differences, see the earlier post: The 8 major AI crawlers — rule differences and best settings.

Why this isn’t “set it once and forget”

AI companies add and rename crawlers (it’s changed several times in the past two years), and one typo in robots.txt — or one default toggle in Cloudflare — can shut your whole site to a given bot. Combined with the fact that lost AI citations come with no alarm, this isn’t a set-once task: it’s ongoing site-health maintenance that requires cross-checking the latest crawler lists and verifying regularly — exactly the kind of invisible, slow-bleed problem that’s best watched continuously rather than discovered after the damage is done.

Did this resonate?

24 reacted

Discussion 16