How to Measure & Analyse Long Tail Search

  • 1
  • June 16, 2010
Patrick Altoft

Patrick Altoft

Director of Strategy

Now that Google Caffeine is 100% live & following up from the recent May Day update I thought it would be good to talk about some of more advanced aspects of long tail SEO.

First of all Caffeine is a new infrastructure rather than an algorithm update so it’s not related to the May Day changes. What Caffeine does is increase the freshness of the Google index by increasing crawl capacity and also decreasing the time it takes to get the crawled pages live and searchable in the index. For bloggers this might not be a big change for new pages because blog platforms normally ping Google and get indexed in a couple of minutes anyway – for the rest of the web this should make a big difference and open up the door for much fresher long tail results.

Google Caffeine

The May Day update basically means that some sites with thin content and a lack of internal links are no longer getting the authority benefits they used to. Google is showing more relevant pages instead which is certainly a positive step.

What is the Long Tail?

Long tail SEO describes the thousands or millions of search terms that individually generate very little traffic but collectively generate a large percentage (perhaps 70%) of a sites overall search traffic. Long tail doesn’t mean keyphrases with 4, 5 or 6 words in the phrase – these may fall into the long tail group but that isn’t always the case. We have some very large 4 word phrases that send thousands of visitors per month and they are classed as short tail terms.

The best way to classify the terms is to look at the chart below from SEOmoz which breaks search traffic into 3 buckets:

  • Short tail – 18.5%
  • Mid tail – 11.5%
  • Long tail – 70%

These figures are approximate but as long as we are consistent it doesn’t matter too much what we choose.

Long Tail

The next step is to do some analysis to measure your current short, mid & long tail traffic numbers so that you can monitor each month how things improve. We set this up as an advanced segment in Google Analytics as well as an Excel chart and find that the following figures tend to give the percentages we want for most websites.

  • Short tail – keyphrase with 100 or more visits per month
  • Mid tail – keyphrase with 6 to 99 visits per month
  • Long tail – keyphrase with 5 or less visits per month

You need to run some figures for your site until you get the percentages in the chart above – don’t forget to remove brand searches.

Segmenting long tail traffic

Visualising millions of keywords that each send a handful of visitors every month is an impossible task so we need to try and segment the data in order to try to improve the numbers. The best way to do this is to split the site into the same sections we have create for our multiple sitemaps above and for each section analyse & monitor the following:

  • Number of pages indexed
  • Number of landing pages receiving > 1 visit per month
  • Number of keywords sending visitors to the section each month

Long & short phrases

Having said that long tail doesn’t necessarily correspond to the number of words in a keyphrase it is still very important to track and monitor the distribution of words in your keyphrases every month. You should do this in two ways, by setting up filters in Google Analytics but also by exporting all your keyword data to Excel and running a pivot table query to show figures such as conversion rate vs keyphrase length and visitor or revenue numbers vs keyphrase length.

Measuring Indexation

The best way to measure indexation on large sites is to split the site into sections and create a different xml sitemap for each section. By doing this in Webmaster Tools you can quickly visualise what pages are getting indexed and which are not.

Multiple sitemaps indexation on large sites

If you find that a particular section has an indexation issue then we need to diagnose what’s going wrong. This get’s a bit technical but the best method we have found is to create a script to check the indexation status of each page as follows:

  • Check to see if the page is indexed using the info: command on Google
  • Check server logs to see how many times the URL has been spidered in last 30 days
  • Use SEOmoz API to find total links to the page & mozRank

Once we have this data we can look into what might be going on & try to fix it.

Measuring number of landing pages

Again, this needs to be done by splitting your site down into different sections, you can do it in bulk but that doesn’t give the right data. The key is to use this method but to add a filter to only show the landing pages from the sub-folder or category you want to analyse.


The final result of your analysis should be a chart that looks something like the one below, taken from one of our ecommerce clients.

Thanks to John from SEO Agency Leeds for taking the screenshots above.