The most expensive misconception: “blocking AI” isn’t one switch
“I don’t want AI taking my content to train on” — a reasonable instinct, so many people go into robots.txt and block every AI-related crawler. The problem: a single AI company often sends more than one crawler, each doing completely different things. Block them all and you usually also cut the one that gets you cited.
The result is the most ironic kind of failure: you think you’re only refusing training, but you’re actually deleting yourself from AI’s answers — and because it’s invisible, you have no idea.
Training bots and search bots are different crawlers
Most mainstream AI companies split their crawlers into two purposes: one fetches content to train models, the other fetches in real time when a user asks a question, then cites you. Blocking the former doesn’t hurt visibility; blocking the latter means leaving that engine’s answers.
| AI company | Training (low impact if blocked) | Search / live citation (you disappear if blocked) |
|---|---|---|
| OpenAI | GPTBot |
OAI-SearchBot, ChatGPT-User |
| Anthropic | ClaudeBot, anthropic-ai |
Claude-User, Claude-SearchBot |
| Perplexity | PerplexityBot |
Perplexity-User |
Google-Extended (opt out of Gemini training) |
Googlebot (block it and you lose Search too) |
|
| Apple | Applebot-Extended |
Applebot |
The point isn’t to memorize this table — it’s to understand that “block training” and “keep visibility” can both be true at once, as long as you can tell which bot is which.
The cost of getting it wrong: vanishing from AI answers
In traditional SEO, blocking the wrong crawler at least shows up as a ranking drop you can notice. But AI citation is invisible: a user asks, AI answers, you’re not in it — no notification, no “impression that didn’t happen” in any dashboard.
That’s what makes blocking the wrong AI bot so dangerous: there’s no alarm. By the time you notice “competitors get mentioned in ChatGPT and I don’t,” you’ve usually been missing out for a long time.
noai and robots.txt are not the same thing
A common confusion worth clearing up: the noai / noimageai meta tags on a page, and robots.txt crawler rules, are two different mechanisms. The former asks “don’t train on this page”; the latter controls “which crawler may fetch which paths.” Both rely on crawlers honoring them voluntarily, neither is an enforceable standard, and both can hurt your visibility if set too bluntly.
So how should you set it
In one line: block training, allow search.
- To opt out of training, write rules for the training UAs (
GPTBot,ClaudeBot,Google-Extended,Applebot-Extended…); - Always allow the search / live-citation UAs (
OAI-SearchBot,Claude-User,Perplexity-User…), or you’re actively opting out of AI answers; - After editing, cross-check against each vendor’s crawler docs to confirm you blocked the bot you think you did.
For each vendor’s full crawler list and rule differences, see the earlier post: The 8 major AI crawlers — rule differences and best settings.
Why this isn’t “set it once and forget”
AI companies add and rename crawlers (it’s changed several times in the past two years), and one typo in robots.txt — or one default toggle in Cloudflare — can shut your whole site to a given bot. Combined with the fact that lost AI citations come with no alarm, this isn’t a set-once task: it’s ongoing site-health maintenance that requires cross-checking the latest crawler lists and verifying regularly — exactly the kind of invisible, slow-bleed problem that’s best watched continuously rather than discovered after the damage is done.
this is the third article ive read this week saying basically the same stuff and none of them tell you what to actually DO on monday morning
wait so do i need to pay for chatgpt to get my shop to show up in it?? sorry if dumb question im not techy, i just run a small bakery and my niece said i should look into this
doesnt work for me
good read but the part about structured content felt a little thin. would love an actual before/after of a page that started getting picked up
wait so do i need to pay for chatgpt to get my shop to show up in it?? sorry if dumb question
not a dumb q — no, paying for chatgpt does nothing for that. it's about your site/info being clear and trustworthy enough that the model picks you when someone asks. paying just gets YOU the fancier model, doesn't make it mention you.
as a small business owner i'm tired lol. just got the hang of google reviews and now there's a whole new thing
good stuff. i run a small agency and we've been quietly doing this kind of work for clients for about a year, happy to compare notes with anyone here who's experimenting, no pitch just nerding out
the part about schema markup is slightly off. the engines aren't 'reading' json-ld the way you imply, most of them rely on the rendered text + retrieval from an index. structured data helps disambiguate entities but it's not the primary signal. worth clarifying so people don't go spend a week on schema thinking it's the magic switch
ok but how is this any different from seo with extra steps? feels like every few years someone renames the same thing
Fair pushback 😅 It's not a replacement — clean site + good content is still the foundation, GEO just sits on top of it. The 'extra steps' are mostly about being the source an answer is built from, not one of ten blue links it ignores. Not enemies, same foundation.
ok but how is this any different from seo with extra steps? feels like every few years someone renames the same thing and sells it back to us
the bit about being the source the model summarizes instead of one of ten links it ignores actually reframed it for me. thanks
honestly half of this reads like the early seo blogs from 15 yrs ago. 'do good content, be the authority, get mentioned in trustworthy places' ... yeah we know, the hard part was always the how and that's exactly the part everyone skips
wait so is this just seo with extra steps
saved this. been trying to figure out why we show up on google fine but the AI answers never mention us. makes more sense now