One of the easy wins in improving search traffic to a large site is to improve indexation. Indexation isn’t about the raw number of pages indexed, it’s about increasing the percentage of real, high value pages, that are indexed.
Forcing Google to index useless pages that won’t get any traffic isn’t going to help things.
Indexation is quite a straightforward issue, every site has an indexation cap based on a number of factors including:
- Site / server speed
- Duplicate content
The last one is hard to explain but basically if Google sees loads of pages that are the same then it probably won’t bother to do as deep a crawl of the site as if it found a lot of high value unique pages.
Monitoring indexing using the site: command every month is good and looking at the number of pages that receive at least one visitor each month is better but both of these methods just look at the site as a whole. What we need is a method of breaking the numbers down so we can see which pages are not indexed and figure out how to improve things.
This is where using multiple sitemaps comes in – rather than just using one giant sitemap what we like to do is use a sitemap for each type of page on the site.
That way we can look at the number of pages indexed for each page type and immediately see that 76% of product pages are indexed but only 43% of the lower level paginated category pages are indexed for example.
Once you can diagnose exactly the type of pages that Google doesn’t want to index you can fix the issue by improving PageRank flow to those pages and adding more unique content.
Some ideas for the type of pages you might like to look at separately:
- New products this month
- Top selling products
- Pages in French/English/German etc
- Products that have not been selling
- Blog posts from a particular month/year
- Product pages
- Category pages
- Paginated category pages (page 2 of 10 etc)
- Products in a certain category