Google’s John Mueller answered a query on Reddit a couple of seemingly false ‘noindex detected in X-Robots-Tag HTTP header’ error reported in Google Search Console for pages that do not need that particular X-Robots-Tag or another associated directive or block. Mueller advised some potential causes, and a number of Redditors supplied affordable explanations and options.
Noindex Detected
The one that began the Reddit dialogue described a situation that could be acquainted to many. Google Search Console experiences that it couldn’t index a web page as a result of it was blocked not from indexing the web page (which is completely different from blocked from crawling). Checking the web page reveals no presence of a noindex meta component and there’s no robots.txt blocking the crawl.
Here’s what the described as their state of affairs:
- “GSC reveals “noindex detected in X-Robots-Tag http header” for a big a part of my URLs. Nonetheless:
- Can’t discover any noindex in HTML supply
- No noindex in robots.txt
- No noindex seen in response headers when testing
- Dwell Take a look at in GSC reveals web page as indexable
- Web site is behind Cloudflare (We now have checked web page guidelines/WAF and so forth)”
Additionally they reported that they tried spoofing Googlebot and examined numerous IP addresses and request headers and nonetheless discovered no clue for the supply of the X-Robots-Tag
Cloudflare Suspected
One of many Redditors commented in that dialogue to recommend troubleshooting if the issue was originated from Cloudflare.
They provided a complete step-by-step directions on methods to diagnose if Cloudflare or the rest was stopping Google from indexing the web page:
“First, evaluate Dwell Take a look at vs. Crawled Web page in GSC to examine if Google is seeing an outdated response. Subsequent, examine Cloudflare’s Remodel Guidelines, Response Headers, and Employees for modifications. Use curl with the Googlebot user-agent and cache bypass (Cache-Management: no-cache) to examine server responses. If utilizing WordPress, disable web optimization plugins to rule out dynamic headers. Additionally, log Googlebot requests on the server and examine if X-Robots-Tag seems. If all fails, bypass Cloudflare by pointing DNS on to your server and retest.”
The OP (orginal poster, the one who began the dialogue) responded that they’d examined all these options however had been unable to check a cache of the positioning by way of GSC, solely the reside web site (from the precise server, not Cloudflare).
How To Take a look at With An Precise Googlebot
Curiously, the OP acknowledged that they had been unable to check their web site utilizing Googlebot, however there’s truly a manner to try this.
Google’s Wealthy Outcomes Tester makes use of the Googlebot consumer agent, which additionally originates from a Google IP tackle. This instrument is helpful for verifying what Google sees. If an exploit is inflicting the positioning to show a cloaked web page, the Wealthy Outcomes Tester will reveal precisely what Google is indexing.
A Google’s wealthy outcomes help web page confirms:
“This instrument accesses the web page as Googlebot (that’s, not utilizing your credentials, however as Google).”
401 Error Response?
The next most likely wasn’t the answer nevertheless it’s an attention-grabbing little bit of technical web optimization information.
One other consumer shared the expertise of a server responding with a 401 error response. A 401 response means “unauthorized” and it occurs when a request for a useful resource is lacking authentication credentials or the supplied credentials will not be the proper ones. Their resolution to make the indexing blocked messages in Google Search Console was so as to add a notation within the robots.txt to dam crawling of login web page URLs.
Google’s John Mueller On GSC Error
John Mueller dropped into the dialogue to supply his assist diagnosing the problem. He mentioned that he has seen this difficulty come up in relation to CDNs (Content material Supply Networks). An attention-grabbing factor he mentioned was that he’s additionally seen this occur with very previous URLs. He didn’t elaborate on that final one nevertheless it appears to indicate some type of indexing bug associated to previous listed URLs.
Right here’s what he mentioned:
“Pleased to have a look if you wish to ping me some samples. I’ve seen it with CDNs, I’ve seen it with really-old crawls (when the problem was there way back and a web site simply has a number of historical URLs listed), perhaps there’s one thing new right here…”
Key Takeaways: Google Search Console Index Noindex Detected
- Google Search Console (GSC) might report “noindex detected in X-Robots-Tag http header” even when that header isn’t current.
- CDNs, equivalent to Cloudflare, might intervene with indexing. Steps had been shared to examine if Cloudflare’s Remodel Guidelines, Response Headers, or cache are affecting how Googlebot sees the web page.
- Outdated indexing information on Google’s facet may additionally be an element.
- Google’s Wealthy Outcomes Tester can confirm what Googlebot sees as a result of it makes use of Googlebot’s consumer agent and IP, revealing discrepancies which may not be seen from spoofing a consumer agent.
- 401 Unauthorized responses can stop indexing. A consumer shared that their difficulty concerned login pages that wanted to be blocked by way of robots.txt.
- John Mueller advised CDNs and traditionally crawled URLs as potential causes.