Robots.txt Generator

Create robots.txt files for search engine crawlers with ease.

Robots.txt Generator

Create robots.txt files for search engine crawlers

User-Agent Rules

Rule 1

Sitemap URLs

Generated robots.txt

User-agent: *
Disallow: /admin
Disallow: /private

Sitemap: https://example.com/sitemap.xml
About robots.txt: This file tells search engine crawlers which pages or files they can or cannot request from your site. Place it in the root directory of your website. Use User-agent: * to apply rules to all crawlers, or specify individual bots like Googlebot.

Robots.txt Reference Guide

1. Set user-agent rules

Use User-agent: * to apply rules to all crawlers, or target specific ones like Googlebot or Bingbot. Rules under each user-agent block only apply to that crawler.

2. Define allow/disallow paths

Block admin areas, staging environments, duplicate content, and internal search result pages. Use Allow: directives to override disallowed parent directories for specific paths.

3. Add sitemap URL

Include a Sitemap directive pointing to your XML sitemap. This helps crawlers discover all your pages efficiently, especially for large sites or those with pages not linked from the homepage.

Common Robots.txt Directives

DirectiveExampleWhat it does
User-agentUser-agent: *Applies rules to all crawlers (*) or a named bot
DisallowDisallow: /admin/Prevents crawling of the specified path and all sub-paths
AllowAllow: /admin/public/Explicitly permits a path within a disallowed parent
SitemapSitemap: https://example.com/sitemap.xmlTells crawlers where to find your XML sitemap
Crawl-delayCrawl-delay: 10Asks the crawler to wait 10 seconds between requests (not supported by Google)
# Comment# Block staging URLsComments — ignored by crawlers, useful for documentation

Robots.txt vs Noindex: What is the Difference?

A common misconception is that blocking a page in robots.txt prevents it from appearing in search results. It does not. Disallow in robots.txt only tells crawlers not to visit that URL — but if other sites link to the blocked page, Google can still discover and index it based on those links alone, even without visiting it. To prevent indexing entirely, use the <meta name="robots" content="noindex"> tag inside the page's HTML head, or an X-Robots-Tag: noindex HTTP response header. Robots.txt is best used to save crawl budget — preventing bots from wasting time on admin pages, internal search results, and duplicate parameter URLs — not as a security or privacy mechanism.

Frequently Asked Questions

Common questions about Robots.txt Generator