By 4 years ago in Google SEO

Learning from Google Webmaster Tools Caffeine data

Some of you may have noticed that your link counts in Google Webmaster Tools have increased somewhat recently, we are seeing amazing numbers up to 1000 times higher than previous figures.

The reason for this is very interesting, following the roll out of the new Caffeine infrastructure Google is able to spider sites far deeper than before and they are now reporting this increased activity in Webmaster Tools. You can see this by looking at your internal links – if you have 50,000 internal links to your homepage then it’s a fair assumption you have 10,000 pages on your site. This number is probably a lot higher than it was last month.

When you take into account the increase in internal links being reported it’s quite clear why Google is now reporting on a lot more external links too, especially when you think about sitewide links.

This number is here to stay according to Google:

Yes, we revamped the data behind the backlinks feature in Webmaster Tools — it has started using more data from “Caffeine” for some sites and is planned to continue with a bit more data in the next week or so. The goal is to have more fresher & up-to-date data there :-).

Another very interesting piece of data we are seeing is that although Google is reporting a lot more pages within Webmaster Tools the number of pages indexed by the site: query on Google has dropped for quite a few sites we monitor. This drop appears to be the result of Google being better able to determine which pages are worth displaying in the index.

Our observations can be summed up as follows:

  • Google is spidering big sites a lot more than before – both in terms of volume of pages and frequency
  • Google is much better at deciding which pages are worth showing in the index
  • A lot of pages that are spidered are not being indexed if they are low value (see Mayday update)
  • Pages that previously were indexed but not ranking are now not indexed (but still being spidered)

Looking deeper

One of the main impacts this has had on SEO is that the way people audit websites is now pretty much invalid. In the past people would do a site: query on Google to look through all the indexed pages and find errors and problems from there. This is not accurate anymore because Google is not allowing low value pages to get into the index as much.

We downloaded all our clients internal links (Google lets you have up to around 100MB) and found (using Excel) that there were quite a few rogue pages that were being spidered and counted as internal links but not indexed – these pages were wasting PageRank and diluting the impact of the good pages. We would never have spotted these pages by looking at indexed pages alone.

By Patrick Altoft. at 10:09AM on Tuesday, 13 Jul 2010

Patrick is the Director of Strategy at Branded3 and has spent the last 11 years working on the SEO strategies of some of the UK's largest brands. Patrick’s SEO knowledge and experience is highly regarded by many, and he’s regularly invited to speak at the world’s biggest search conferences and events. Follow Patrick Altoft on Twitter.

comments

11 Responses to “Learning from Google Webmaster Tools Caffeine data”

  1. idietcola says:

    Speed of index seems alot quicker – as in – the links i now create, even from deep pages are being found and seem to be impacting the serps sooner…

    to me it seems that the serps have also taken into account more relevant TDLs, no more can i simply throw some easy but juicy .com links up and tap a few UK directories to them to shove up .co.uk, its almost like the juicy links now have to be .co.uk – i know before this was the case, but that is what i boil this update down to as a link builder.

    Great read- although i can report the other way – im seeing large volumes of links dropping off from GWMT… maybe as you suggest – they may have dropped – i simply assumed some site-wides had thankfully fell off – as all the pages i refer to have a use in their own right…

    Maybe they have split data centers or something… so instead of collaborating the English data centers, they now are more… independent.

    Thanks for the read!

  2. racing games says:

    Very interesting. Just went to my google webmaster tools and I saw significant increase on my site backlinks (both internal and external). I can't tell exactly how much but perhaps about 100 times.

    But I don't see any decrease on site: query. According to webmaster tools 1930 urls in web index and 1910 results for site: query.

  3. PROSEO says:

    @racing games: same for us!

  4. its interesting since although they show a count of thousand links, they'll only show a hand full

  5. danielzane says:

    very good daniel zane uk

  6. jhon says:

    Very cool, pity the same can't be said for the interview process and the lack of any courtesy in letting unsuccessful candidates know that they had not been selected…..shame to spoil a good news story but this is not the first time this has happened with Connect.

  7. Mark says:

    We've noticed an increase as well. Not quite 1000 times higher though!

  8. Eric Watts says:

    This is great! Was a bit frustrating having to wait for ages for links to show up under webmaster tools. This is a very large step in the right direction!

  9. Hampers says:

    I think Google is always trying to improve on what they have and also when you search for something…that the results indexed are even more relevant than they were before. Matt Cutts has mentioned this before that they would be always trying to get better and more relevant results. Good article.

  10. [...] is an interesting blog post concerning Learning from Google Webmaster Tools Caffeine data [...]

  11. [...] Learning from Google Webmaster Tools Caffeine data [...]

Leave a Reply