Indexability — robots, sitemap, and noindex leaks

The indexability check confirms a site is crawlable — robots.txt and sitemap.xml present and not blocking everything — and, most importantly, hunts for an accidental noindex on the homepage that would silently de-index a live site. It's an infrastructure check, not a marketing-SEO tool.

The single check here — “Robots & sitemap / indexability” — fetches three things, following the apex↔www redirect to the real site: /robots.txt, /sitemap.xml, and the homepage HTML. Its highest priority is catching a noindex leak: it scans the homepage for a robots or googlebot meta tag with content=…noindex… or an X-Robots-Tag: noindex response header. If it finds one, the check fails with “Homepage is set to NOINDEX / silently de-indexed” — and that overrides everything else, because a noindex on a live page is the worst outcome regardless of how tidy the robots.txt is.

With no noindex leak, the grading is straightforward. It passes when robots.txt is present and isn't blocking all crawlers, and sitemap.xml is present. It warns when robots.txt is missing or blocks all crawlers, or when the sitemap is missing. If the homepage can't be fetched at all, the check is recorded as an error.

Warning

A noindex leak overrides everything and fails the check. If the homepage carries a noindex meta tag or X-Robots-Tag header, that's an immediate fail no matter how healthy robots.txt and the sitemap are — because it means the page is being silently dropped from search. This is the headline value of the check: catching an accidental noindex that a staging deploy or a CMS toggle left on a live client page.

Note

This is an infrastructure check, not a marketing-SEO tool. It confirms search engines can technically reach and index the site — it doesn't grade keywords, content, or rankings. (Marketing-SEO signals were removed in the shift to MSP-focused monitoring.)

Note

This check is what the “site deindexed” per-signal alert watches. If the homepage flips to noindex, that alert fires so you find out the moment a live page starts dropping out of search, not weeks later when traffic craters.

Frequently asked questions

What does the indexability check look at?

It fetches /robots.txt, /sitemap.xml, and the homepage HTML (following apex↔www). It passes when robots.txt is present and not blocking all crawlers and a sitemap is present, with no noindex leak. Missing/blocking robots.txt or a missing sitemap warns; an unfetchable homepage is an error.

What's a noindex leak and why does it override everything?

A noindex leak is a live page carrying a robots/googlebot meta tag with noindex, or an X-Robots-Tag: noindex header — which tells search engines to drop the page. Because that silently de-indexes a live site, it's an immediate fail that overrides the robots.txt and sitemap results.

Is this an SEO tool?

No — it's an infrastructure check. It confirms search engines can technically crawl and index the site; it doesn't grade keywords, content, or rankings. Marketing-SEO signals were removed when the product focused on MSP-style monitoring.

Does this trigger an alert?

Yes. This is the signal the “site deindexed” per-signal alert fires on. If the homepage flips to noindex, that alert goes out so you catch it immediately rather than after traffic drops.

Put this to work across your whole client list.

Daily monitoring and alerts for every domain you manage — from $99/mo, 14-day free trial.

Indexability — robots, sitemap, and noindex leaks — Domain Watchdog