Tool

Robots.txt Validator and Checker

Paste your robots.txt file to validate syntax, inspect user-agent rules, test whether specific URLs are blocked, and check AI crawler access for GPTBot and ClaudeBot. Free tool, no signup required.

Your robots.txt file is one of the most consequential files on your entire website, and most site owners have not looked at it since the day it was created. A misconfigured robots.txt can accidentally block your most important pages from search engine indexing, block AI crawlers from reading your content, or leave your admin areas exposed to unnecessary crawling.

The robots.txt conversation has become significantly more urgent in 2025 and 2026 as AI systems have entered the crawling landscape. GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers are now indexing the web to power real-time retrieval in AI assistants. If your robots.txt blocks these crawlers, your content cannot be cited, surfaced, or recommended by AI systems regardless of how good it is. Many sites are blocking AI crawlers by default through overly broad Disallow rules without realising the GEO consequences.

This validator parses your robots.txt file, checks for common errors and misconfigurations, identifies AI crawler access rules, and lets you test specific URLs against your current directives. Paste your file or use the fetch mode to retrieve it live from any domain.

Fetches the live /robots.txt from your server through ours, then validates it. Rate-limited to 10 fetches per hour.
File status
Valid
parseable
User-agent groups
0
rule groups
Disallow rules
0
paths blocked
Sitemaps declared
0
sitemap references
Issues found
0
errors and warnings
Checks
Test a URL against your rules

Methodology

The validator parses your robots.txt file by grouping directives under their User-agent declarations and checking each rule against the robots exclusion protocol standard. It checks for the presence of a wildcard User-agent rule, sitemap declarations, admin path protections, duplicate directives, and AI crawler access.

The URL tester applies the same rule matching logic used by Googlebot: more specific rules take precedence over less specific ones, and Allow rules take precedence over Disallow rules of equal specificity. The tester shows you exactly which rule was matched and whether it results in the URL being allowed or blocked.

AI crawler detection specifically checks for GPTBot, ClaudeBot, PerplexityBot, and CCBot rules. If any of these are blocked, the tool flags it as a GEO consideration because blocking these crawlers prevents your content from being ingested for AI retrieval and citation.

How to use this tool

  1. Paste your robots.txt content into the text area, or use fetch mode to retrieve it from any domain
  2. Click Validate to run all checks and see the results
  3. Review the validation checks for errors, warnings, and informational notes
  4. Use the URL tester to check whether specific pages on your site are blocked or allowed
  5. Review the AI crawler detection section and check whether GPTBot and similar crawlers have appropriate access

Frequently asked questions

What is a robots.txt file?
A robots.txt file is a text file placed in the root directory of a website that gives instructions to web crawlers about which pages or sections of the site should not be crawled or indexed. It uses the Robots Exclusion Protocol, a widely adopted standard that search engines and other crawlers follow voluntarily. It is a crawling directive, not an access control mechanism. Pages disallowed in robots.txt are not necessarily kept private; they are simply not crawled by compliant bots.
Should I block AI crawlers like GPTBot in my robots.txt?
This is a strategic decision with real implications for generative engine optimisation. Blocking AI crawlers prevents your content from being used to train AI models and from being retrieved in real-time AI assistant responses. For most content publishers who want to be cited and surfaced by AI systems, blocking AI crawlers is counterproductive. For publishers with proprietary content they want to protect from AI training, blocking may be appropriate. The key is to make this decision deliberately rather than accidentally.
What is the difference between Disallow and Allow in robots.txt?
Disallow tells crawlers not to access a specific path or set of paths. Allow explicitly permits access to a path that would otherwise be covered by a broader Disallow rule. Allow rules take precedence over Disallow rules of equal or lesser specificity. A common pattern is to Disallow the entire admin directory while using an Allow rule to permit access to the admin AJAX endpoint, which WordPress requires for some front-end functionality.
Does robots.txt affect my SEO?
Yes, significantly. Blocking important pages from crawling prevents them from being indexed and therefore from ranking in search results. Leaving admin and low-value pages open to crawling wastes crawl budget. A well-configured robots.txt file directs crawl budget toward your most important content and away from pages that add no SEO value. A misconfigured robots.txt is one of the most common causes of unexplained organic traffic drops.
What is crawl-delay and does Google respect it?
Crawl-delay is a robots.txt directive that tells crawlers how many seconds to wait between requests to your server. It is supported by some crawlers including Bingbot and Yandex, but Googlebot ignores it. To manage Googlebot's crawl rate, use the crawl rate settings in Google Search Console instead.

← All tools