Are Links a Good Proxy for Traffic?
One of the most exciting developments on the web in the last two years has been the increased ability to combine two or more technologies to create something new. We see this most in the Social space, of course, but overlaying data sets could lead to a dramatically increased understanding of the world we live in.
Today – after a delay of some months – I received my Hitwise (now called Experian Hitwise) newsletter. It is one of the very very that I actually remember signing up to. The other 100 a day never seem to get to my in-box now. Hitwise always use the newsletter to focus on on me as a UK user and choose an industry to give us some insight about the market share (in terms of traffic) about that industry and today chose to focus on the UK’s online property websites.
I have often thought it would be interesting to Map the link data onto the traffic data to see if there is a correlation between the links and the traffic. If we accept that Google is less of a “search engine” and more of a “popularity contest” then being able to analyse this correlation at the drop of a hat would be worth quite a lot to someone I would have thought.
For example, taking the top five companies listed in the Hitwise data, we can map their cumulative back-links over time:
It is interesting to see that with the minor difference between the fourth and fifth positions, the inbound links (by domain) puts the sites in the same order as Hitwise does when it tracks traffic:
What IS markedly different, however, is the copmparitively large market share that Rightmove gets compared to its competion – highlighted presumably accurately in the Hitwise data, even though Findaproperty have almost as many links on the internet.
What might this tell us?
For me, intuitively, it suggests that Rightmove have developed much of their traffic from offline branding. If you have been in the UK and seen adverts or looked in Newsagent windows, you’d probably not need data to back that up, but it does suggest that there may be some maths that could be used to show the power of a brand offline, by looking at the relative difference in traffic patterns, compared to links, online.
This may not be the only insight, however. One thing I tend to look at is the quality of the inbound links – particularly at the upper end – and I can tell you that “Findaproperty.com” has – on the surface – several more links from pages which have huge numbers of inbound links themselves. However – when we look further – many of those links come from the same domain. 8 out of 10 Findaproperty’s strongest links come from the “This Is” range of sites. By contrast, all 10 out of 10 of Rightmove’s strongest back-links come from different root domains. I would wager that their back-link profile is spread with a considerably larger standard deviation of IP addresses or class c domains than Findaproperty.
Further analysis would be required, but I think that’s my point – when you start to map two or more extremely large data sets together in this way, some really powerful new insights are available. This is why we use APIs do be able to securely share data. Whether it is Microsoft’s demographic profiling onto Majestic’s link data or Google’s analytics data onto its Adwords data ( which I am assured they don’t do by the way). The effect can be incredibly powerful and large companies are now starting to collaborate with these kinds of interrelationships.