Free SEO Tool · by overrank
Free AI Bot robots.txt Generator
18 different AI crawlers are scraping the web in 2026 — some to train models, some to fetch pages live for AI answers, some both. This tool lets you decide which ones can access your site, with one toggle each. Outputs a clean robots.txt snippet you paste into your existing file.
Use the presets at the top if you do not want to think about each one. We recommend allowing the live-fetch and search bots (so you get cited in AI answers) while blocking pure training bots like CCBot if you want to opt out of training datasets.
Quick presets
13 allowed · 5 blocked
OpenAI
GPTBotOpenAICrawls the web to train ChatGPT models.
ChatGPT-UserOpenAIFetches your pages live when ChatGPT browses the web for a user.
OAI-SearchBotOpenAIIndexes pages for ChatGPT Search results.
Anthropic
ClaudeBotAnthropicCrawls the web to train Claude models.
anthropic-aiAnthropicLegacy crawler. Mostly superseded by ClaudeBot but still respected.
claude-webAnthropicFetches live pages when Claude browses the web for a user.
Perplexity
PerplexityBotPerplexityIndexes content for Perplexity answers.
Perplexity-UserPerplexityLive fetch when a Perplexity user opens your page from an answer.
Google-ExtendedGoogleControls whether Google can use your content to train Gemini and other AI products.
⚠ Blocking does NOT remove you from Google Search. It only stops Gemini training.
Common Crawl
CCBotCommon CrawlOpen-data crawler. Common Crawl data is used by most major LLM training pipelines.
⚠ You disappear from most LLM training datasets at once.
ByteDance
BytespiderByteDanceCrawls for ByteDance / TikTok AI training. Frequently aggressive.
Apple
Applebot-ExtendedAppleApple Intelligence AI training crawler.
Meta
Meta-ExternalAgentMetaCrawls for Meta AI training and link previews.
FacebookBotMetaRenders link previews when your URL is shared on Facebook.
Amazon
AmazonbotAmazonCrawls for Alexa and Amazon AI services.
Diffbot
DiffbotDiffbotCommercial knowledge graph crawler. Used by enterprise AI products.
Imagesift
ImagesiftBotImagesiftImage data collector for AI training.
Webz.io
OmgilibotWebz.ioNews and forum aggregation crawler. Feeds many AI datasets.
robots.txt snippet
# AI crawler rules generated by overrank — https://www.overrank.ai/tools/ai-robots-txt-generator User-agent: GPTBot Disallow: User-agent: ChatGPT-User Disallow: User-agent: OAI-SearchBot Disallow: User-agent: ClaudeBot Disallow: User-agent: anthropic-ai Disallow: User-agent: claude-web Disallow: User-agent: PerplexityBot Disallow: User-agent: Perplexity-User Disallow: User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Applebot-Extended Disallow: User-agent: Meta-ExternalAgent Disallow: User-agent: FacebookBot Disallow: User-agent: Amazonbot Disallow: User-agent: Diffbot Disallow: User-agent: ImagesiftBot Disallow: / User-agent: Omgilibot Disallow: /
How to use this snippet:
- Open your existing
robots.txtat your site root (or create one) - Append these blocks to the end (or replace any existing AI bot blocks)
- Keep your existing Googlebot, Bingbot, and Sitemap directives untouched
✦ This is 1 of 30+ things overrank automates
Once the bots can find you, give them something worth citing.
overrank publishes daily SEO-optimized articles that LLMs cite when answering customer questions in your niche.
No credit card · takes 60 seconds
Frequently asked questions
Should I block AI crawlers from my site?
It depends on your goal. If you want to be cited in ChatGPT, Claude, and Perplexity answers, you should ALLOW their crawlers — blocking them removes you from their answer pool. If you are worried about training data scraping without attribution, you can selectively block training-focused bots while allowing the live-fetch and search bots.
What is the difference between GPTBot and ChatGPT-User?
GPTBot crawls the web to train future ChatGPT models. ChatGPT-User fetches your pages live in real time when a ChatGPT user asks a question that triggers browsing. You can allow one and block the other independently. Most sites want to allow ChatGPT-User (to get cited in answers) but might block GPTBot (training data).
Will blocking Google-Extended remove me from Google Search?
No. Google-Extended only controls Gemini and Google's AI training. Blocking it has zero impact on your Google Search rankings. Googlebot (which controls search indexing) is completely separate.
Do these bots actually respect robots.txt?
Major bots from OpenAI, Anthropic, Google, and Perplexity have publicly committed to honoring robots.txt. Common Crawl honors it. Some smaller or scraper-style bots ignore it — for those you need IP-level blocking or Cloudflare bot management, not robots.txt.
Where does the robots.txt file go?
At the root of your site, accessible at https://yoursite.com/robots.txt. Append these rules to your existing robots.txt rather than replacing the whole file — you want to keep your existing Googlebot, Sitemap, and other directives.