Digital Sapien Blog » Page 'Google Not Fully Respecting Robots.txt File?'

Google Not Fully Respecting Robots.txt File?





Google does not appear to be fully respecting robots.txt right now. I’ve encountered a few cases of this today - including Google’s own Blogger.com.

Checking Blogger’s robots.txt file shows a short list of disallowed URLs:

# robots.txt for http://www.blogger.com
 
User-agent: *
Disallow: /profile-find.g
Disallow: /comment.g
Disallow: /email-post.g





However, a Google search using the query “site:blogger.com profile find” returns

clip_image002

As you can see, the first result returned is exactly the disallowed URL. Note that it is indexed, but is apparently not cached - there is no search listing snippet.

Although the page is not being cached, the fact that it is being indexed at all shows that Google is not fully respecting Robots.txt! This seems to be a recent development, and hopefully it is just a bug that will soon be patched up, as opposed to a change in Google’s behavior.

Leave a comment

XHTML - You can use:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>