Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a text file placed in the root directory of a website that gives instructions to web crawlers about which pages or sections of the site should not be crawled or indexed. It uses the Robots Exclusion Protocol, a widely adopted standard that search engines and other crawlers follow voluntarily. It is a crawling directive, not an access control mechanism. Pages disallowed in robots.txt are not necessarily kept private; they are simply not crawled by compliant bots.

Question 2

Should I block AI crawlers like GPTBot in my robots.txt?

Accepted Answer

This is a strategic decision with real implications for generative engine optimisation. Blocking AI crawlers prevents your content from being used to train AI models and from being retrieved in real-time AI assistant responses. For most content publishers who want to be cited and surfaced by AI systems, blocking AI crawlers is counterproductive. For publishers with proprietary content they want to protect from AI training, blocking may be appropriate. The key is to make this decision deliberately rather than accidentally.

Question 3

What is the difference between Disallow and Allow in robots.txt?

Accepted Answer

Disallow tells crawlers not to access a specific path or set of paths. Allow explicitly permits access to a path that would otherwise be covered by a broader Disallow rule. Allow rules take precedence over Disallow rules of equal or lesser specificity. A common pattern is to Disallow the entire admin directory while using an Allow rule to permit access to the admin AJAX endpoint, which WordPress requires for some front-end functionality.

Question 4

Does robots.txt affect my SEO?

Accepted Answer

Yes, significantly. Blocking important pages from crawling prevents them from being indexed and therefore from ranking in search results. Leaving admin and low-value pages open to crawling wastes crawl budget. A well-configured robots.txt file directs crawl budget toward your most important content and away from pages that add no SEO value. A misconfigured robots.txt is one of the most common causes of unexplained organic traffic drops.

Question 5

What is crawl-delay and does Google respect it?

Accepted Answer

Crawl-delay is a robots.txt directive that tells crawlers how many seconds to wait between requests to your server. It is supported by some crawlers including Bingbot and Yandex, but Googlebot ignores it. To manage Googlebot's crawl rate, use the crawl rate settings in Google Search Console instead.

Robots.txt Validator and Checker

Methodology

How to use this tool

Frequently asked questions

Related tools

Related reading