No Login Data Private Local Save

Robots.txt Tester - Online Check URL Allow/Disallow

6
0
0
0

Robots.txt Tester

Check if a URL is allowed or disallowed for any crawler. Paste or fetch any robots.txt to inspect rules instantly.

Enter any URL — we'll locate and fetch its robots.txt automatically.
0 characters
Test URL
/
Test a specific path. Leave empty to test the root /.
NO DATA

Fetch or paste a robots.txt, then test a URL path.

Frequently Asked Questions

A robots.txt file is a plain-text file placed in the root directory of a website (e.g., /robots.txt) that tells search engine crawlers and other bots which pages or sections of the site they are allowed or disallowed to crawl. It follows the Robots Exclusion Protocol (REP).

Disallow tells a bot not to crawl a path. Allow overrides a broader Disallow for a specific sub-path. Rules are evaluated in order for each user-agent block, and the most specific matching rule wins. If no rule matches, the path is allowed by default. The * wildcard matches any character sequence, and $ forces an exact end-of-path match.

Robots.txt prevents crawling, not indexing. If Google discovers your URL through external links, it may still index it (showing a "No information" snippet). To prevent indexing, use a <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex HTTP header. For complete protection, combine both robots.txt disallow and noindex tags.

The * character serves two purposes in robots.txt: (1) As a user-agent value, User-agent: * means the rules apply to all bots that don't have a more specific block. (2) In path patterns, * matches any sequence of characters. For example, Disallow: /*.pdf$ blocks all PDF files regardless of directory. Combine with $ for precise matching.

Use the following robots.txt:
User-agent: *
Disallow: /
This tells every bot (*) that the entire site (/) is disallowed. Note that well-behaved bots respect this, but malicious bots may ignore robots.txt entirely.

Crawl-delay specifies the minimum time (in seconds) a bot should wait between requests. Example: Crawl-delay: 10 means 10 seconds between crawls. Googlebot does not support Crawl-delay — use Google Search Console to adjust crawl rate. Bingbot and YandexBot do respect it. Use it sparingly; setting it too high can slow down indexing.

Simply append /robots.txt to your domain: https://yourdomain.com/robots.txt. If you see a 404 error, your site doesn't have one — which means all bots are allowed to crawl everything. Use this tool to fetch and inspect any site's robots.txt instantly.

No. Robots.txt supports only prefix matching with two special characters: * (wildcard for any character sequence) and $ (end-of-path anchor). It does not support full regular expressions, character classes, or alternation. For more granular control, use meta robots tags or HTTP headers on individual pages.

Common errors include: (1) Blocking CSS/JS files — Google needs them for rendering; (2) Using Disallow: with no path (means nothing is blocked — use Disallow: / to block all); (3) Relying solely on robots.txt for hiding sensitive content (use authentication instead); (4) Wrong path format — paths are case-sensitive and prefix-matched; (5) Not testing — always use a tester like this one before deploying!

Absolutely! Each User-agent: line starts a new block. You can target specific bots with tailored rules. Example: block Googlebot-Image from your /images/ folder while allowing all other bots. You can also group multiple user-agents together by listing them on consecutive lines before the rules — they'll share the same Allow/Disallow directives.