By 4 years ago in SEO

Using multiple sitemaps to analyse indexation on large sites

One of the easy wins in improving search traffic to a large site is to improve indexation. Indexation isn’t about the raw number of pages indexed, it’s about increasing the percentage of real, high value pages, that are indexed.

Forcing Google to index useless pages that won’t get any traffic isn’t going to help things.

Indexation is quite a straightforward issue, every site has an indexation cap based on a number of factors including:

  • PageRank
  • Trust
  • Site / server speed
  • Duplicate content

The last one is hard to explain but basically if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site as if it found a lot of high value unique pages.

Monitoring indexing using the site: command every month is good and looking at the number of pages that receive at least one visitor each month is better but both of these methods just look at the site as a whole. What we need is a method of breaking the numbers down so we can see which pages are not indexed and figure out how to improve things.

Multiple sitemaps

This is where using multiple sitemaps comes in – rather than just using one giant sitemap what we like to do is use a sitemap for each type of page on the site.

That way we can look at the number of pages indexed for each page type and immediately see that 76% of product pages are indexed but only 43% of the lower level paginated category pages are indexed for example.

Once you can diagnose exactly the type of pages that Google doesn’t want to index you can fix the issue by improving PageRank flow to those pages and adding more unique content.

Some ideas for the type of pages you might like to look at separately:

  • New products this month
  • Top selling products
  • Pages in French/English/German etc
  • Products that have not been selling
  • Blog posts from a particular month/year
  • Product pages
  • Category pages
  • Paginated category pages (page 2 of 10 etc)
  • Products in a certain category

Thanks to John from web development leeds for the screenshot.

By Patrick Altoft. at 4:38PM on Sunday, 28 Mar 2010

Patrick is the Director of Strategy at Branded3 and has spent the last 11 years working on the SEO strategies of some of the UK's largest brands. Patrick’s SEO knowledge and experience is highly regarded by many, and he’s regularly invited to speak at the world’s biggest search conferences and events. Follow Patrick Altoft on Twitter.

comments

31 Responses to “Using multiple sitemaps to analyse indexation on large sites”

  1. Rune Risom says:

    Brilliant post, Patrick. Really useful info and it will be put to work immediately :)

    Thanks a lot!

  2. Stephen Webb says:

    This is an interesting way to increase SEO rankings, and although it appears to require more work and analysis than normal, the results seem to outweigh the costs.

    Grouping pages and creating an individual sitemap for each group type is a good way to analyse how Google sees your site, and develop its page ranking further. This process would obviously be easier for smaller sites, but I wonder how it could be implemented on ecommerce sites with many products and categories.

    I’ll be interested to see the comments regarding this, and how many try and implement it. Maybe some feedback on their results in the future would be a useful resource to see how successful this process is.

  3. David Bain says:

    Internesting angle – an excellent method of segmenting conversion rates on different parts of your website. I like to do something similar – hosting multiple blogs on the same domain, hence being able to submit each blog to blog directories and create a new sitemap for each blog.

  4. It’s quite a new perspective on the use of sitemaps. Will try and see what results I get.

  5. Tim says:

    Great post. Will be trying this out.

  6. Ian says:

    One addition: If you use an index sitemap, then Google Webmaster Tools does the work for you, showing you the sub-maps and the indexation of each.

  7. Chris says:

    Ian, can you expound on what you mean by “index sitemap?” That sounds interesting.

  8. ian says:

    Google lets you create a single, central sitemap that points to other ones. You can search for ‘index sitemap’ and I think you’ll find it.

  9. [...] Patrick from Branded3 has just written a post on this very topic – Patrick’s approach using separate XML sitemaps for different site sections is well worth a read and complements what I’ve written about here very [...]

  10. Here’s my experience with a very large site – 74 million pages. Google decided that 72 million of them were duplicate content owing to substantial content being the same, although technically unique. There was one index file and several thousand individual sitemaps. The low-quality pages were in distinct sitemaps, but this did not get the remaining sitemaps crawled substantially. Google did honour the priority hint to some degree (the low-quality pages were 0.2, while the better pages were given a higher priority hint).

    Then we removed the low-quality sitemaps. No noticeable improvement to the indexing of the ‘good’ pages. Then we removed the low-quality pages even though there was no sitemap for them. Things are slowly picking up now re indexing.

  11. Diagnosing Google Crawl Allowance Using Webmaster Tools & Excel SEO Technique Help W3C Tag says:

    [...] Patrick from Branded3 has just written a post on this very topic – Patrick’s approach using separate XML sitemaps for different site sections is well worth a read and complements what I’ve written about here very [...]

  12. Jordan says:

    Can anyone else comment on what Ash stated. I would love to hear if the existence of thing pages even if they are not linked to can harm a site.

  13. Richard says:

    I note your comment that “if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site” and wonder if you have any evidence of this.

    I was under the impression that crawl depth was correlated with page rank so if (for example) you have a PR4 home page leading to a PR2 catalogue page that, in turn, has a thousand PR0 product pages coming from it, why would the presence of those product pages make any difference to crawl depth?

  14. Jordan, A troubling discovery is Google’s recent remarks that you can’t rely on robots.txt to keep a page out of the index. You can see this in one of John Mueller’s replies in Google Groups. We had tried to keep the spiders away from the substantially duplicate pages but the pages were not falling out of the index, at least at the expected speed.

    At large sites one can’t make substantial or radical changes overnight, which makes SEO at large sites more challenging.

  15. @Richard – In an interview with Matt Cutts by Eric Enge, Matt mentions that if Googlebot sees lots of duplicate content on a site, then it may not do a deep crawl of that site.

  16. Richard says:

    @Ben – I had read the interview with Mr. Cutts and this is what I was alluding to. The impression I get is that crawl depth is less a ‘site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page. I agree that if you have a lot of duplicate content sub-pages these will be low PR and are unlikely to get crawled often but my experience does not indicate that having these will have any negative impact on the crawl rate of the pages closer to the root. So what I’m saying is from a site perspective, a shallow crawl of a deep site with lots of product pages may be no worst and possibly better than a deep crawl of a shallow site without the extra pages.

    However if you also have a few yet deeper pages that you really want indexed then I can see there may be a reason for the ‘sculpting’ system you outline above.

  17. [...] interne Verlinkung Aufholbedarf gibt. Ein sehr guter Artikel gibt es aktuell von Patrick Altoft: Using multiple sitemaps to analyse indexation on large sitesvia Bilder nun auch in Sitemap hinzufügen | tagSeoBlog Tags: Google Sitemap, Image, Sitemap, [...]

  18. Maxim Burov says:

    Sitemap is gooood !!!

  19. england flag says:

    The impression that crawl depth was correlated with page rank so if you have a PR4. The impression I get is that crawl depth is less a ’site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page.

  20. [...] nice technique to set about this task is outlined in Using multiple sitemaps to analyse indexation on large sites on Blogstorm. This suggests breaking the sitemap into multiple files, and examining the results [...]

  21. Julian says:

    Yes, nice article. I definitely will try this for some of my websites. Thank you.

  22. Amit says:

    Hi, Really your tips is useful.

  23. Steve Baik says:

    Generally how big does the site have to be to require multiple site maps? Thanks from Steve,
    @tigervinci
    We handle Web design in Bellevue
    sorry the last post links didn’t work

  24. Steve, when the number of URLs reaches 50,000 or the file size of the sitemap.xml file reaches 10 MB, you should start your next one. http://www.sitemaps.org/faq.php#faq_sitemap_size

  25. Very handy information,
    Thank you

  26. [...] use this feature. This can be particularly useful if you have a really large site and want to use site maps to diagnose crawl issues. For example, you could have a sitemap for each category of products, drilling down into each [...]

  27. [...] Using multiple sitemaps to analyse indexation on large sites [...]

  28. marshjosem says:

    hi Patric

    good contribution and keep contributing with us many more new topics. Can you find out for me any tool or software that creates multiple sitemaps.

  29. Probiotix says:

    That’s a great post Patrick which is still very useful.

    There is an interesting post on seomoz arguing that multiple XML sitemaps increased indexation, hence traffic and a very useful one on distilled.

    However, I’ve found that this post is more inspiring and complete. keep up the good work with more awesome posts like this one.

  30. Matt says:

    I love the idea of this. I’ve been trying to do it for awhile. I just need more information on ‘how-to’ create the sub–categoried xml sitemaps. Step by step maybe, haha..

Leave a Reply