By 5 years ago in SEO

Using multiple sitemaps to analyse indexation on large sites

One of the easy wins in improving search traffic to a large site is to improve indexation. Indexation isn’t about the raw number of pages indexed, it’s about increasing the percentage of real, high value pages, that are indexed.

Forcing Google to index useless pages that won’t get any traffic isn’t going to help things.

Indexation is quite a straightforward issue, every site has an indexation cap based on a number of factors including:

  • PageRank
  • Trust
  • Site / server speed
  • Duplicate content

The last one is hard to explain but basically if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site as if it found a lot of high value unique pages.

Monitoring indexing using the site: command every month is good and looking at the number of pages that receive at least one visitor each month is better but both of these methods just look at the site as a whole. What we need is a method of breaking the numbers down so we can see which pages are not indexed and figure out how to improve things.

Multiple sitemaps

This is where using multiple sitemaps comes in – rather than just using one giant sitemap what we like to do is use a sitemap for each type of page on the site.

That way we can look at the number of pages indexed for each page type and immediately see that 76% of product pages are indexed but only 43% of the lower level paginated category pages are indexed for example.

Once you can diagnose exactly the type of pages that Google doesn’t want to index you can fix the issue by improving PageRank flow to those pages and adding more unique content.

Some ideas for the type of pages you might like to look at separately:

  • New products this month
  • Top selling products
  • Pages in French/English/German etc
  • Products that have not been selling
  • Blog posts from a particular month/year
  • Product pages
  • Category pages
  • Paginated category pages (page 2 of 10 etc)
  • Products in a certain category

Thanks to John from web development leeds for the screenshot.

By Patrick Altoft. at 4:38PM on Sunday, 28 Mar 2010

Patrick is the Director of Strategy at Branded3 and has spent the last 11 years working on the SEO strategies of some of the UK's largest brands. Patrick’s SEO knowledge and experience is highly regarded by many, and he’s regularly invited to speak at the world’s biggest search conferences and events. Follow Patrick Altoft on Twitter.

comments

  • http://www.kwigger.com Rune Risom

    Brilliant post, Patrick. Really useful info and it will be put to work immediately :)

    Thanks a lot!

  • http://www.crearedesign.co.uk Stephen Webb

    This is an interesting way to increase SEO rankings, and although it appears to require more work and analysis than normal, the results seem to outweigh the costs.

    Grouping pages and creating an individual sitemap for each group type is a good way to analyse how Google sees your site, and develop its page ranking further. This process would obviously be easier for smaller sites, but I wonder how it could be implemented on ecommerce sites with many products and categories.

    I’ll be interested to see the comments regarding this, and how many try and implement it. Maybe some feedback on their results in the future would be a useful resource to see how successful this process is.

  • http://www.26weekplan.com David Bain

    Internesting angle – an excellent method of segmenting conversion rates on different parts of your website. I like to do something similar – hosting multiple blogs on the same domain, hence being able to submit each blog to blog directories and create a new sitemap for each blog.

  • http://www.syscomminternational.com/blog Online Reputation Management

    It’s quite a new perspective on the use of sitemaps. Will try and see what results I get.

  • http://designmilitia.co.uk Tim

    Great post. Will be trying this out.

  • http://www.portentinteractive.com Ian

    One addition: If you use an index sitemap, then Google Webmaster Tools does the work for you, showing you the sub-maps and the indexation of each.

  • http://www.managemyproperty.com Chris

    Ian, can you expound on what you mean by “index sitemap?” That sounds interesting.

  • http://www.conversationmarketing.com ian

    Google lets you create a single, central sitemap that points to other ones. You can search for ‘index sitemap’ and I think you’ll find it.

  • Pingback: Diagnosing Google Crawl Allowance Using Webmaster Tools & Excel | Mobil Seo()

  • http://www.netmagellan.com/ Ash Nallawalla

    Here’s my experience with a very large site – 74 million pages. Google decided that 72 million of them were duplicate content owing to substantial content being the same, although technically unique. There was one index file and several thousand individual sitemaps. The low-quality pages were in distinct sitemaps, but this did not get the remaining sitemaps crawled substantially. Google did honour the priority hint to some degree (the low-quality pages were 0.2, while the better pages were given a higher priority hint).

    Then we removed the low-quality sitemaps. No noticeable improvement to the indexing of the ‘good’ pages. Then we removed the low-quality pages even though there was no sitemap for them. Things are slowly picking up now re indexing.

  • Pingback: Diagnosing Google Crawl Allowance Using Webmaster Tools & Excel SEO Technique Help W3C Tag()

  • Jordan

    Can anyone else comment on what Ash stated. I would love to hear if the existence of thing pages even if they are not linked to can harm a site.

  • http://www.cheapexperiencedays.com Richard

    I note your comment that “if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site” and wonder if you have any evidence of this.

    I was under the impression that crawl depth was correlated with page rank so if (for example) you have a PR4 home page leading to a PR2 catalogue page that, in turn, has a thousand PR0 product pages coming from it, why would the presence of those product pages make any difference to crawl depth?

  • http://www.netmagellan.com/ ashnallawalla

    Jordan, A troubling discovery is Google’s recent remarks that you can’t rely on robots.txt to keep a page out of the index. You can see this in one of John Mueller’s replies in Google Groups. We had tried to keep the spiders away from the substantially duplicate pages but the pages were not falling out of the index, at least at the expected speed.

    At large sites one can’t make substantial or radical changes overnight, which makes SEO at large sites more challenging.

  • http://bengriffiths.me/ Ben Griffiths

    @Richard – In an interview with Matt Cutts by Eric Enge, Matt mentions that if Googlebot sees lots of duplicate content on a site, then it may not do a deep crawl of that site.

  • http://www.cheapexperiencedays.com Richard

    @Ben – I had read the interview with Mr. Cutts and this is what I was alluding to. The impression I get is that crawl depth is less a ‘site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page. I agree that if you have a lot of duplicate content sub-pages these will be low PR and are unlikely to get crawled often but my experience does not indicate that having these will have any negative impact on the crawl rate of the pages closer to the root. So what I’m saying is from a site perspective, a shallow crawl of a deep site with lots of product pages may be no worst and possibly better than a deep crawl of a shallow site without the extra pages.

    However if you also have a few yet deeper pages that you really want indexed then I can see there may be a reason for the ‘sculpting’ system you outline above.

  • Pingback: Google Image Sitemap()

  • http://all-journal.info Maxim Burov

    Sitemap is gooood !!!

  • http://www.worldwide-flags.com england flag

    The impression that crawl depth was correlated with page rank so if you have a PR4. The impression I get is that crawl depth is less a ’site-wide’ thing as a page based thing i.e. the decision/frequency to crawl deeper from any one page will be dependent on the page rank of that page.

  • Pingback: Using Webmaster Tools to find what’s not indexed()

  • http://www.jigniter.com Julian

    Yes, nice article. I definitely will try this for some of my websites. Thank you.

  • http://seo4rank.blogspot.com Amit

    Hi, Really your tips is useful.

  • http://www.tigervinci.com/ Steve Baik

    Generally how big does the site have to be to require multiple site maps? Thanks from Steve,
    @tigervinci
    We handle Web design in Bellevue
    sorry the last post links didn’t work

  • http://www.netmagellan.com/ ashnallawalla

    Steve, when the number of URLs reaches 50,000 or the file size of the sitemap.xml file reaches 10 MB, you should start your next one. http://www.sitemaps.org/faq.php#faq_sitemap_size

  • http://www.websitefacility.co.uk Website Facility

    Very handy information,
    Thank you

  • http://www.websitefacility.co.uk Website Facility

    Quite handy,
    thanks

  • Pingback: SEO | Creative XML Site Maps | One Result()

  • Pingback: SEO-SEM-Tools()

  • http://www.walldirectory.com/ marshjosem

    hi Patric

    good contribution and keep contributing with us many more new topics. Can you find out for me any tool or software that creates multiple sitemaps.

  • http://www.probiotixfoods.com Probiotix

    That’s a great post Patrick which is still very useful.

    There is an interesting post on seomoz arguing that multiple XML sitemaps increased indexation, hence traffic and a very useful one on distilled.

    However, I’ve found that this post is more inspiring and complete. keep up the good work with more awesome posts like this one.

  • http://www.gravitatedesign.com Matt

    I love the idea of this. I’ve been trying to do it for awhile. I just need more information on ‘how-to’ create the sub–categoried xml sitemaps. Step by step maybe, haha..