Robots.txt and XML Sitemaps: Practical SEO Basics
Robots.txt and XML sitemaps solve different SEO problems. Robots.txt controls crawler access, while a sitemap lists URLs you want search engines to discover and revisit.
Confusing the two can hide important pages. A URL can be present in a sitemap but blocked in robots.txt, which sends conflicting signals and makes debugging indexing harder.
Keep crawl rules simple
Robots.txt should be easy to read and intentionally boring. Block only areas that should not be crawled, such as internal search results, temporary previews, or private paths that are protected elsewhere.
- Do not block CSS or JavaScript required to render public pages.
- Use a Sitemap line that points to the canonical sitemap URL.
- Keep separate rules for major bots only when there is a clear reason.
- Test the final file before uploading it.
Use sitemaps for discovery
A sitemap should contain canonical public URLs, useful lastmod dates, and no broken links. It is not a fix for thin content, but it helps search engines find the pages you already want indexed.
- List extensionless canonical URLs if that is what the site serves.
- Remove redirected, blocked, duplicate, or 404 URLs.
- Update lastmod only when meaningful page content changes.
- Submit the sitemap in Search Console after important updates.
Check both files together
When an indexation problem appears, review robots.txt, sitemap.xml, canonical tags, redirects, and the live HTTP status. One file rarely tells the whole story.
Robots.txt and sitemaps work best when they are clear and consistent. They should describe the crawlable version of the site, not fight it.
Open Robots.txt Generator →