•
Google Not Fully Respecting Robots.txt File?
Google does not appear to be fully respecting robots.txt right now. I’ve encountered a few cases of this today - including Google’s own Blogger.com.
Checking Blogger’s robots.txt file shows a short list of disallowed URLs:
# robots.txt for http://www.blogger.com
User-agent: *
Disallow: /profile-find.g
Disallow: /comment.g
Disallow: /email-post.g
However, a Google search using the query “site:blogger.com profile find” returns
As you can see, the first result returned is exactly the disallowed URL. Note that it is indexed, but is apparently not cached - there is no search listing snippet.
Although the page is not being cached, the fact that it is being indexed at all shows that Google is not fully respecting Robots.txt! This seems to be a recent development, and hopefully it is just a bug that will soon be patched up, as opposed to a change in Google’s behavior.
Leave a comment