Free SEO Tool · by overrank

Free AI Bot robots.txt Generator

18 different AI crawlers are scraping the web in 2026 — some to train models, some to fetch pages live for AI answers, some both. This tool lets you decide which ones can access your site, with one toggle each. Outputs a clean robots.txt snippet you paste into your existing file.

Use the presets at the top if you do not want to think about each one. We recommend allowing the live-fetch and search bots (so you get cited in AI answers) while blocking pure training bots like CCBot if you want to opt out of training datasets.

Quick presets

13 allowed · 5 blocked

OpenAI

GPTBotOpenAI

Crawls the web to train ChatGPT models.

ChatGPT-UserOpenAI

Fetches your pages live when ChatGPT browses the web for a user.

OAI-SearchBotOpenAI

Indexes pages for ChatGPT Search results.

Anthropic

ClaudeBotAnthropic

Crawls the web to train Claude models.

anthropic-aiAnthropic

Legacy crawler. Mostly superseded by ClaudeBot but still respected.

claude-webAnthropic

Fetches live pages when Claude browses the web for a user.

Perplexity

PerplexityBotPerplexity

Indexes content for Perplexity answers.

Perplexity-UserPerplexity

Live fetch when a Perplexity user opens your page from an answer.

Google

Google-ExtendedGoogle

Controls whether Google can use your content to train Gemini and other AI products.

Blocking does NOT remove you from Google Search. It only stops Gemini training.

Common Crawl

CCBotCommon Crawl

Open-data crawler. Common Crawl data is used by most major LLM training pipelines.

You disappear from most LLM training datasets at once.

ByteDance

BytespiderByteDance

Crawls for ByteDance / TikTok AI training. Frequently aggressive.

Apple

Applebot-ExtendedApple

Apple Intelligence AI training crawler.

Meta

Meta-ExternalAgentMeta

Crawls for Meta AI training and link previews.

FacebookBotMeta

Renders link previews when your URL is shared on Facebook.

Amazon

AmazonbotAmazon

Crawls for Alexa and Amazon AI services.

Diffbot

DiffbotDiffbot

Commercial knowledge graph crawler. Used by enterprise AI products.

Imagesift

ImagesiftBotImagesift

Image data collector for AI training.

Webz.io

OmgilibotWebz.io

News and forum aggregation crawler. Feeds many AI datasets.

robots.txt snippet

# AI crawler rules generated by overrank — https://www.overrank.ai/tools/ai-robots-txt-generator

User-agent: GPTBot
Disallow:

User-agent: ChatGPT-User
Disallow:

User-agent: OAI-SearchBot
Disallow:

User-agent: ClaudeBot
Disallow:

User-agent: anthropic-ai
Disallow:

User-agent: claude-web
Disallow:

User-agent: PerplexityBot
Disallow:

User-agent: Perplexity-User
Disallow:

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Applebot-Extended
Disallow:

User-agent: Meta-ExternalAgent
Disallow:

User-agent: FacebookBot
Disallow:

User-agent: Amazonbot
Disallow:

User-agent: Diffbot
Disallow:

User-agent: ImagesiftBot
Disallow: /

User-agent: Omgilibot
Disallow: /

How to use this snippet:

  1. Open your existing robots.txt at your site root (or create one)
  2. Append these blocks to the end (or replace any existing AI bot blocks)
  3. Keep your existing Googlebot, Bingbot, and Sitemap directives untouched

✦ This is 1 of 30+ things overrank automates

Once the bots can find you, give them something worth citing.

overrank publishes daily SEO-optimized articles that LLMs cite when answering customer questions in your niche.

No credit card · takes 60 seconds

Frequently asked questions

Should I block AI crawlers from my site?

It depends on your goal. If you want to be cited in ChatGPT, Claude, and Perplexity answers, you should ALLOW their crawlers — blocking them removes you from their answer pool. If you are worried about training data scraping without attribution, you can selectively block training-focused bots while allowing the live-fetch and search bots.

What is the difference between GPTBot and ChatGPT-User?

GPTBot crawls the web to train future ChatGPT models. ChatGPT-User fetches your pages live in real time when a ChatGPT user asks a question that triggers browsing. You can allow one and block the other independently. Most sites want to allow ChatGPT-User (to get cited in answers) but might block GPTBot (training data).

Will blocking Google-Extended remove me from Google Search?

No. Google-Extended only controls Gemini and Google's AI training. Blocking it has zero impact on your Google Search rankings. Googlebot (which controls search indexing) is completely separate.

Do these bots actually respect robots.txt?

Major bots from OpenAI, Anthropic, Google, and Perplexity have publicly committed to honoring robots.txt. Common Crawl honors it. Some smaller or scraper-style bots ignore it — for those you need IP-level blocking or Cloudflare bot management, not robots.txt.

Where does the robots.txt file go?

At the root of your site, accessible at https://yoursite.com/robots.txt. Append these rules to your existing robots.txt rather than replacing the whole file — you want to keep your existing Googlebot, Sitemap, and other directives.