Over the last 6 months Google has been striding forwards with using click data for natural search and over the weekend we had confirmation that it’s being used for Google News stories as well. This makes total sense but it’s nice to hear it officially.
Josh Cohen is the Senior Business Product Manager for Google News and he was recently interviewed by Eric Enge about various Google News issues including the row with people like Rupert Murdoch.
The whole interview is a must read but the ranking section is copied below:
Eric Enge: Are there other things that go into ranking news stories?
Josh Cohen: For article ranking there are a number of signals that we are trying to use: is it original content, is it timely, is it relevant, is this a local story, and there is a local source reporting original content on it? That is again, not always relevant to every single story, but it is something else we will look for. Other questions we ask are, is it novel, or is it just a rehash of an article that was out there before, a story that somebody else broke, you just happen to publish it later. These are things that we look for, hard to do, but increasingly something that we are trying to include in our rankings.
Then, there are also source-specific signals that we try to use. This is where volume comes in: what is the volume of publication of original content in a given category? The example that I would like to use is, looking at the business category, you have got the Wall Street Journal, or Bloomberg, or Reuters, all of whom, any given day, are publishing probably hundreds of original stories in business. By itself, that is a decent signal that this is a quality source in that category.
You can compare that then with their volume of publication of original content in the sports category, you are probably not going to see a whole lot, if any, of original publication there.
I would say another really important signal for us in recent quarters has been the user behavior. Their behavior has become a really helpful signal for us in trying to determine that same trusted quality of a given source. So in a given cluster, the first link will get the most clicks, the second gets less clicks, and the third, the fourth, and so on, keep getting fewer and fewer clicks. But, if you look at a user who comes in, and instead of clicking on that first link which is what they were â€œsupposed to do,â€ and instead let’s say they click on the fourth link; that is a very strong signal about both the source that they clicked on and also the three sources above it that they didnâ€™t click on, even though they were â€œsupposed toâ€ click on that.
Over time, as you aggregate that information, normalize it for different click positions, you can look at this section-by-section to get a sense of what users feel are the best sources in given categories. Again, sticking with the business example, if I have got some random source as the #1 link in Google News, and Reuters in the #3 link, somebody may come to that and say “Wait a second, this is a business story, I want to see what Reuters has to say, I am clicking on that link in the third spot.”