Studying Spider Crawl Rate to Find Your Most Trusted Pages

  • 0
  • November 8, 2007
Patrick Altoft

Patrick Altoft

Director of Strategy

One of the tools I find very valuable on both my own sites and those of certain clients is my Crawl Rate Analysis Program (CRAP, for short). It logs every single visit by a search engine spider into a database so that I can analyse which pages are being spidered the most and, most importantly, why.

Some years ago the concept of website trust was still in its infancy and I realised that PageRank, incoming links and TrustRank were all related but that measuring them was very tricky thanks to Google keeping toolbar PR inaccurate. One day I decided to start tracking spider activity and quickly found that the pages that were spidered the most were the ones with the most incoming link juice.

Most SEO’s know that more link equity = deeper crawling & deeper indexing but what most people don’t mention is the fact that more link equity = more frequent crawling of already indexed content.

Keeping a watch on how spiders hit your pages is hugely valuable for link builders as it is possible to see exactly which links work and which don’t. Imagine you have a page about Green Widgets which has been spidered by Google at a rate of 4 times per week for a year. You can go and buy a text link for a month and analyse the spider activity to see if it increases. No increase in spider activity means the link is not a trusted link and you can cancel the subscription. A dramatic increase in spider activity means you need to buy more links from that site.

If you want to go and buy a bunch of deep links you can point them to different test pages on your site and measure the spider activity for each page giving you a picture of which links are working and which are not passing equity.

The log files for spider activity on BlogStorm during October are here and you can see how the most trusted pages (the ones with loads of quality incoming links) are being spidered about 10 times per day. Drill down into the actual spider logs by clicking on the number of crawls and you can see that Yahoo spiders my site a huge amount while Google doesn’t visit quite as often. MSN visits the least often, hitting the homepage about once per day.

Interestingly Google sometimes visits the BlogStorm homepage 13 times per day, I guess it likes my content.

Some of the pages with a lot of link equity are category pages which have a sitewide link but the most valuable data comes from looking at the links pointing to the blog posts with the most spider activity, that’s where the data gets valuable.

Free of charge. Unsubscribe anytime.