Digital Sapien Blog » Archive of 'Dec, 2007'

XML Sitemaps Work Pretty Damn Fast on Google


The timeline goes something like this.

12:30am - Published blog entry “Why I Chose Online Education“.  As a recent article, the entry now leads off my homepage and has a permanent page to live on.

12:33am - Updated XML sitemap to reflect new page and pinged Google about the update.

12:57am - New blog entry “Why I Chose Online Education” and updated homepage content appear in Google searches restricted to Digitalsapien.com pages.  My search query was site:www.digitalsapien.com online educationNow that’s fast for just indexing the new page … but what about ranking alongside other sites?

Digitalsapien Listing

1:19am - Shocked to discovered Google ranks my entry 138th out of 1,640,000 listings for the phrase ellis college online education.  Click here to view a screenshot of Google rankings for online education ellis college.  (If the image appears small just click to enlarge it)

Now that’s pretty damn fast.  It’s one more reason why every site should be using an XML sitemap and pointing Google and Yahoo toward it.  So all you webmasters out there, go to Google Webmaster Tools register, and set up your website feed.

Google Not Fully Respecting Robots.txt File?


Google does not appear to be fully respecting robots.txt right now. I’ve encountered a few cases of this today - including Google’s own Blogger.com.

Checking Blogger’s robots.txt file shows a short list of disallowed URLs:

# robots.txt for http://www.blogger.com
 
User-agent: *
Disallow: /profile-find.g
Disallow: /comment.g
Disallow: /email-post.g

However, a Google search using the query “site:blogger.com profile find” returns

clip_image002

As you can see, the first result returned is exactly the disallowed URL. Note that it is indexed, but is apparently not cached - there is no search listing snippet.

Although the page is not being cached, the fact that it is being indexed at all shows that Google is not fully respecting Robots.txt! This seems to be a recent development, and hopefully it is just a bug that will soon be patched up, as opposed to a change in Google’s behavior.

} catch(err) {}