Free Technical SEO Tool

Robots.txt & Sitemap Checker

Quickly verify crawlability — robots.txt, sitemap.xml, and indexing risk in one check

We fetch robots.txt and sitemap.xml from the domain root. Both files are public — no auth involved.

How it works

No black box. Here's exactly what Robots.txt & Sitemap Checker checks.

  1. 1

    You enter your domain

    We use the root — protocol, host. Anything after is ignored.

  2. 2

    We fetch /robots.txt

    Verify it exists, returns 200, and check for accidental "Disallow: /" rules and missing Sitemap directives.

  3. 3

    We fetch /sitemap.xml

    Or whatever URL is declared in robots.txt. We confirm it parses, count URLs, and detect index sitemaps.

  4. 4

    You get a verdict + fixes

    Severity-coded issues. Each fix is concrete enough to do in 5 minutes.

Why this matters

A misconfigured robots.txt or missing sitemap is the most expensive small bug in SEO — it silently kills your visibility for weeks before anyone notices. The classic horror story: someone copies a staging robots.txt with "Disallow: /" into production. Site disappears from Google. Three weeks pass. By the time anyone catches it, the rebuild costs months.

  • "Disallow: /" or "noindex" left over from a staging environment is the #1 silent SEO killer.
  • A missing or empty sitemap forces crawlers to discover URLs through internal links only — orphan pages never get indexed.
  • Sitemap declarations in robots.txt are still the canonical way to point crawlers at your sitemap, even with Search Console submitted.
  • Sitemap index files matter for sites over ~50,000 URLs — Google rejects single sitemaps over that limit.

Want the full story across every page?

The Robots.txt & Sitemap Checker checks one URL. CrawlTide audits your whole site, tracks issues over time, watches your AI Visibility weekly, and pushes meta-tag fixes straight to your CMS.

No credit card. Free tier covers a small site end-to-end.

Frequently asked questions

Do I really need a robots.txt?
You don't — without one, crawlers default to "allow everything." But you almost always want one for two reasons: declaring your sitemap location, and blocking specific paths (admin areas, search-result pages, faceted-navigation URLs that produce infinite duplicates).
Where should robots.txt and sitemap.xml live?
Both must be at the domain root: yourdomain.com/robots.txt and yourdomain.com/sitemap.xml. Google looks there first. You can override the sitemap location by declaring it in robots.txt, but the convention is the root.
How big can my sitemap be?
Single sitemap: max 50,000 URLs and 50 MB uncompressed. Larger sites use a sitemap index file that points to multiple sitemap files (each under the limits). Most CMSes do this automatically.
Will this catch noindex tags too?
Not in this tool — those are per-page meta tags. The AI Visibility and SEO Score tools both flag noindex on the page they audit. CrawlTide's full audit flags noindex across your whole site after a crawl.
My sitemap is dynamically generated — will this still work?
Yes, as long as it returns 200 with valid XML. We don't care whether it's a static file or generated on demand by your framework (Next.js sitemap.ts, Rails routes, etc).
Should I block AI crawlers like GPTBot?
Depends on your goals. If you want to be cited in ChatGPT and similar, allow GPTBot. If you don't want OpenAI training on your content, disallow it. Most SaaS marketing sites benefit from being visible. The full CrawlTide product can audit per-bot rules across all 30+ AI crawlers.