Google Not Fully Respecting Robots.txt File?

Comments Off

Google does not appear to be fully respecting robots.txt right now. I’ve encountered a few cases of this today – including Google’s own Blogger.com.

Checking Blogger’s robots.txt file shows a short list of disallowed URLs:

# robots.txt for http://www.blogger.com
 
User-agent: *
Disallow: /profile-find.g
Disallow: /comment.g
Disallow: /email-post.g

However, a Google search using the query “site:blogger.com profile find” returns

clip_image002

As you can see, the first result returned is exactly the disallowed URL. Note that it is indexed, but is apparently not cached – there is no search listing snippet.

Although the page is not being cached, the fact that it is being indexed at all shows that Google is not fully respecting Robots.txt! This seems to be a recent development, and hopefully it is just a bug that will soon be patched up, as opposed to a change in Google’s behavior.

FREE Video Training For Internet Marketers

Get insider tips on how to make more money from your internet marketing efforts as well as how to bring thousands of new visitors to your website. Here’s what you get:

  • Ultimate Entrepreneur eCourse
  • The Online Profit Training
  • SEO Traffic Building Coaching Call (pre-recorded)
  • Blogging for Newbies eCourse
  • Internet Marketing News Center

    We respect your privacy. Your information will not be shared with any third party and you can unsubscribe at any time

    About author:

    All entries by

    Close