Question 1

Should I block AI crawlers from my site?

Accepted Answer

It depends on your goal. If you want to be cited in ChatGPT, Claude, and Perplexity answers, you should ALLOW their crawlers — blocking them removes you from their answer pool. If you are worried about training data scraping without attribution, you can selectively block training-focused bots while allowing the live-fetch and search bots.

Question 2

What is the difference between GPTBot and ChatGPT-User?

Accepted Answer

GPTBot crawls the web to train future ChatGPT models. ChatGPT-User fetches your pages live in real time when a ChatGPT user asks a question that triggers browsing. You can allow one and block the other independently. Most sites want to allow ChatGPT-User (to get cited in answers) but might block GPTBot (training data).

Question 3

Will blocking Google-Extended remove me from Google Search?

Accepted Answer

No. Google-Extended only controls Gemini and Google's AI training. Blocking it has zero impact on your Google Search rankings. Googlebot (which controls search indexing) is completely separate.

Question 4

Do these bots actually respect robots.txt?

Accepted Answer

Major bots from OpenAI, Anthropic, Google, and Perplexity have publicly committed to honoring robots.txt. Common Crawl honors it. Some smaller or scraper-style bots ignore it — for those you need IP-level blocking or Cloudflare bot management, not robots.txt.

Question 5

Where does the robots.txt file go?

Accepted Answer

At the root of your site, accessible at https://yoursite.com/robots.txt. Append these rules to your existing robots.txt rather than replacing the whole file — you want to keep your existing Googlebot, Sitemap, and other directives.

Free AI Bot robots.txt Generator

Once the bots can find you, give them something worth citing.

Frequently asked questions