No Login Data Private Local Save

Robots.txt Generator - Online Create & Validate Robots File

18
0
0
0

Robots.txt Generator & Validator

Create, preview, validate, and download your robots.txt file instantly

Configuration
* All Bots Googlebot Googlebot-Image Bingbot GPTBot CCBot AhrefsBot SemrushBot Baiduspider YandexBot
robots.txt preview
Lines: 0 Chars: 0
# Your robots.txt will appear here...
Paste Your Robots.txt
Validation Results

Paste a robots.txt and click "Validate Now" to see results

✓ 0 Pass ⚠ 0 Warnings ✗ 0 Errors Total: 0 lines
Frequently Asked Questions

A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells search engine crawlers which pages or files they can and cannot request from your site. It's a key part of technical SEO, helping you control crawl budget, prevent indexing of sensitive or duplicate content, and manage which bots access your site. While it's not a security measure (malicious bots may ignore it), well-behaved crawlers like Googlebot, Bingbot, and others respect its directives.

The robots.txt file must be placed in the root directory of your website, making it accessible at https://yourdomain.com/robots.txt. For example, if your website is hosted at /public_html/, place the file at /public_html/robots.txt. It must be named exactly robots.txt (all lowercase). Placing it in a subdirectory (e.g., /blog/robots.txt) will not work—crawlers only check the root.

  • User-agent: Specifies which crawler the rules apply to. Use * for all crawlers.
  • Disallow: Tells crawlers not to access a specific path. E.g., Disallow: /admin/
  • Allow: Overrides a broader Disallow rule for a specific path. E.g., Allow: /admin/public/
  • Sitemap: Points to your XML sitemap. E.g., Sitemap: https://example.com/sitemap.xml
  • Crawl-delay: Sets a delay (in seconds) between crawl requests. Supported by Bing and Yandex, but not by Google.
  • # Comments: Lines starting with # are ignored by crawlers and used for human-readable notes.

No, robots.txt does not support full regular expressions. However, it does support two limited wildcard characters: * (matches any sequence of characters) and $ (matches the end of a URL). For example, Disallow: /*.pdf$ blocks all PDF files. Google and Bing both respect these wildcards. For more complex URL matching, you'd need server-side solutions or meta robots tags.

You can validate your robots.txt file using our Validator tab above. Simply paste your robots.txt content and click "Validate Now" to check for syntax errors, missing directives, conflicting rules, and common typos. Additionally, Google Search Console offers a built-in robots.txt Tester under the "Settings" → "robots.txt" section, which lets you test how Googlebot interprets your file against specific URLs.

Common user-agent tokens include: Googlebot (Google's main crawler), Googlebot-Image (Google Images), Bingbot (Bing), Slurp (Yahoo), DuckDuckBot (DuckDuckGo), Baiduspider (Baidu), YandexBot (Yandex), GPTBot (OpenAI's ChatGPT crawler), CCBot (Common Crawl), AhrefsBot, and SemrushBot. Using * applies rules to all crawlers that support robots.txt. You can find a comprehensive list at robotstxt.org.

Yes! You can define separate rule blocks for different crawlers by using multiple User-agent: lines. Blocks are separated by one or more blank lines. Each block starts with one or more User-agent declarations followed by the rules for those crawlers. If multiple blocks apply to a crawler (e.g., a specific User-agent: Googlebot block and a User-agent: * block), the crawler will follow the most specific block that matches its name.

Google enforces a 500 KB (kilobytes) maximum file size limit for robots.txt files. If your file exceeds this limit, Google may only process the first 500 KB and ignore the rest. Most robots.txt files are well under 1 KB, so this is rarely an issue. However, if you have an extensive list of URL patterns, be mindful of this limit. Our generator includes a character counter to help you stay within bounds.

Not necessarily. A Disallow directive only prevents crawling—it does not prevent indexing. If other pages link to the disallowed URL, or if the URL is discovered through other means, it may still appear in search results (though without a cached snippet). To fully prevent a page from appearing in search results, use a <meta name="robots" content="noindex"> tag in the HTML, or return an X-Robots-Tag: noindex HTTP header. For the best protection, combine both robots.txt disallow and noindex tags.

Search engines typically refresh their cached version of your robots.txt file every 24 to 48 hours, though this can vary. Google generally checks robots.txt more frequently for high-traffic or frequently updated sites. If you make changes, you can request an immediate re-crawl in Google Search Console. Note that robots.txt changes are not instantaneous—it may take a day or two for all crawlers to pick up the update. For urgent blocks, use noindex tags or server-level access controls.