Google PlusFacebook iconTwitter icon+44 113 260 4010 contact@branded3.com

Analysing the Google Panda / Content Farmer update

Last week Google announced a major update which affects around 11.8% of all search queries on Google.com. The update was intended to reduce rankings for low-quality sites commonly known as content farms.

As yet the update has yet to hit the UK however as most of the larger content farm type sites are US based it’s probably not going to have as big an impact over here.

Whilst this update is affecting 11.8% of all queries it’s not actually affecting very many sites. The algorithm is being applied at a domain level rather than a specific page level and the sites affected are so large that even a few hundred sites suffering could cause 11.8% of all queries to be altered.

I will come to our conclusions about how this algorithm works later on but first we need to understand why Google would need an extra algorithm to combat these type of sites. Traditionally Google would use a combination of link data and on-site optimisation to judge rankings but for content farms this is impossible for a number of reasons.

Firstly these sites have been around for years, think of the article directories like ezinearticles.com and also content sites such as eHow.com which have been around perhaps 10 years or so. Secondly they have huge volumes of links – ehow.com has 83,000, ezinearticles.com has 206,000 and articlesbase.com has 120,000. Compare this to the BBC with around 500,000 linking domains & the Daily Mail has 128,000 and you can see that Google would really struggle not to trust these sites based on link data.

Finally these sites are super optimised for the long tail of search. They have thousands of users uploading keyword rich content every day and were previously doing a very good job of getting large amounts of traffic. Ezinearticles admitted they could lose half of their 57 million monthly unique visitors with this update, articlesbase.com was recently featured on Techcrunch as a site with 20 million monthly visitors.
Panda update

How the algorithm works

There has been a lot of talk about who has been affected and how the algorithm works but from what I’ve read there are a lot of misconceptions and theories which just don’t stack up for me.

In an interview with Wired, Matt Cutts & Amit Singhal go into more detail about the algorithm and leave some key comments:

Wired.com: How do you recognize a shallow-content site? Do you have to wind up defining low quality content?

Singhal: That’s a very, very hard problem that we haven’t solved, and it’s an ongoing evolution how to solve that problem. We wanted to keep it strictly scientific, so we used our standard evaluation system that we’ve developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: “Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?”

Cutts: There was an engineer who came up with a rigorous set of questions, everything from. “Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?” Questions along those lines.

Singhal: And based on that, we basically formed some definition of what could be considered low quality. In addition, we launched the Chrome Site Blocker [allowing users to specify sites they wanted blocked from their search results] earlier , and we didn’t use that data in this change. However, we compared and it was 84 percent overlap [between sites downloaded by the Chrome blocker and downgraded by the update]. So that said that we were in the right direction.

Wired.com: But how do you implement that algorithmically?

Cutts: I think you look for signals that recreate that same intuition, that same experience that you have as an engineer and that users have. Whenever we look at the most blocked sites, it did match our intuition and experience, but the key is, you also have your experience of the sorts of sites that are going to be adding value for users versus not adding value for users. And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons …

The theory put forward by Tom Critchlow is that by making a site look nicer, moving ads below the fold and generally making it appear more trusted content farms can optimise their way out of this algorithm. This seems a bit too easy for me – Google is full of clever people and the idea behind this algorithm is to stop low quality content from attracting traffic in the large volumes it currently does. If this algorithm is to work then the only way the sites can get traffic back is to rewrite large volumes of content.

Google has stated that they are using quality raters to classify low quality sites so it’s easy to jump to the conclusion that it’s the human reviewers who are deciding the results.

My understanding of this algorithm is that the human data is being used in just the same way as the recent Chrome website blocker data – purely as a tool for machine learning and testing of the algorithm. The breakthrough that Singhal talks about from a Googler named Panda is undoubtedly related to the paper he published recently detailing how they could judge quality based on CTR and bounce rate on ads.

My theory is that Google has used human reviewers (and now the Chrome data) to build some kind of machine learning algorithm which looks at the characteristics of all the websites being labelled as low quality and then figures out commonalities and creates an algorithm based around factors such as CTR, bounce rate and perhaps CTR on ads as well.

Another factor Google is likely to be looking at is the bounce rate in comparison not only to other sites ranking for the same query but the bounce rate and time on page in relation to how good a match the site was for the query. If you land on ezinearticles and the page covers exactly what you were looking for then if the article is good you will probably read it all, if not then you will bounce straight back to Google or click an ad. An article really well matched to the query should have good metrics, if not then it’s a sign of low quality.

This algorithm can then be tested time and time again in conjunction with the data gained from human reviews and the Chrome plugin.

I would guess that sites with low quality content are likely to have the same CTR, read rates and bounce rate characteristics whether they have good design or not so it’s going to be very hard for sites to design their way out of this.

Readability

My theory on why ehow.com has not been hit is simple – their content is a lot more readable. Look at the readability of this article compared to this one and this one.

Factors such as whitespace, narrower column width, images, sub-headings etc all contribute to increasing the number of people who read through to the end of the article. I always read to the end of an ehow article no matter how good it is but I’ve never read to the end of something on ezinearticles.

In summary, there is a chance that if you can improve the number of people reading to the end of an article significantly (across a large percentage of your pages) then you could lessen the effects of this algorithm. This sounds the same as Tom’s theory but I think his point was that making the site more trusted would make it appeal more to human reviewers while I think that they are just used for building & testing the machine learning part of the new “readability” algorithm.

BY Patrick Altoft AT 9:58am ON Saturday, 5 March 2011

Patrick Altoft is Director of Search at Branded3 and has worked in the SEO industry for over 10 years. With experience across some of the worlds largest brands as well as startup businesses Patrick is well known in the industry and speaks regularly at the major SEO conferences and events. Follow Patrick on Twitter or Google+

Comments

  • buyer beware

    On Google, page 1 result at no 7 for debt management plan is the company the OFT is seeking to ban , from the OFT’s press release of 22nd Feb 2011 “Separately, the OFT is also taking action to seek to compulsorily vary the company’s existing trading name ‘The UK Insolvency Helpline’ and to refuse its application to add the trading names ‘www.ivahelpline.co.uk’ and ‘www.insolvencyhelpline.co.uk’ to its licence. This action is subject to final determination by an OFT Adjudicator.”

    Dangerous sites with dodgy domains and .org sites that are a page of affilite links to Amazon are far more harmful to Google’s reputation than boring results.

    http://www.google.co.uk/#hl=en&xhr=t&q=debt+management+plan&cp=16&pf=p&sclient=psy&aq=0&aqi=&aql=&oq=debt+management+pl&pbx=1&fp=1d3db6eacb0cad36

  • http://www.seoworkers.com John S. Britsios (aka Webnauts)

    I do not know if you understand the term readability. Otherwise you would not show those articles as an example.

  • http://www.blogstorm.co.uk Patrick Altoft

    John – readability means how easy it is to read and to make sense of a text. The use of sub-headings and narrower column widths mean it’s better to read ehow than the other examples. I’m not saying the content is good, in most cases its awful, but I bet more people read to the end of the article.

  • http://www.artisanmanagement.com ARTiSAN

    The aim of this latest update is to penalise low quality content sites (irrespective of whether they’re easy to read or not). There’s far too much rubbish on some directory sites so this approach has to be a good thing. It’s now up to the directory sites (and similar sites) to up their game.

  • http://www.distilled.co.uk Tom Critchlow

    Hi Patrick – nice article. I actually didn’t realise quite how much traffic some of these sites got! Pretty crazy…

    I agree that content readability is one of the factors, but I’ve also seen examples of sites with amazing content get hit. One of the points I made in my post is that this is a multi-faceted update so there isn’t just one factor you need to look at but many different factors all together.

    Given the machine learning element to it I think really understanding this update will be futile, instead I think the advice is the same as it’s always been – provide value to the user. Maybe that bar has just been raised a little higher now…

  • Pingback: Google Patent reveals plan to fill gaps left by content farms & how quality is judged

  • MarkZarian

    “Look at the readability of this article compared to this one and this one”

    eHow is the most difficult to read.

  • http://www.further.co.uk Further Chris

    Excellent analysis. I was also thinking along the lines of Goog using bounce rate and CTR in conjunction with on-page factors to determine quality and relevance. The only question left open for us link builders is will links from these “penalised” domains be devalued? Would change a pretty large part of the link graph considering how large some of these sites are. What are your thoughts?

  • http://www.blogstorm.co.uk Patrick Altoft

    Chris this is a good question & one that will be hard to analyse. Most article directory links were already devalued and I can’t see many sites relying totally on these sort of links. Sites like articlesbase are already nofollow anyway.

  • http://www.further.co.uk Further Chris

    True. I guess the effects will be negligible for a rounded link builder if these links are already of very low value. There are however some agencies out there where their whole “link building strategy” involves bulk submitting to article / PR sites. In this case I would image their clients should start to worry a little!

  • Pingback: Google Farmer Update Updates – AKA “Panda”

  • http://www.tri-win.com Michael

    It appears to be as though Google is trying to penalize sites that simply link pages with no valuable content. If penalizing link farm sites has a negative effect on the traffic of real pages with links on the farms, and it will, then so be it. Google has been transparent regarding their views regarding link farms for years.

  • Content Farm God

    A few inconsistencies in your article. Firstly, the algorithm changed on 24th February, so you meant “week before last”.

    Secondly, the results will undoubtedly change just as much in the UK. You will find content farms ranking just as highly in the UK results as they DID in the US results. This will benefit some indepedant sites, screw others, a lot of sites rely on “content farms” for their backlinks. If you use article directories and such like as means of backlinks then expect your authority to fall somewhat.

    I take 13% of my content farm traffic from the UK, so expect results to flip 360. The algorithm also effects many domains which use keyword rich titles, so the owner of www dot brightgreenwidgetgadgets.com should be wary. This isn’t just a war on content farms, the biggest gainers have been the big commerce sites such as Amazon.

    You also use eHow as an example, they gained a lot by the way, no slap for them. Which makes me think that tweaks are coming before the global roll out.

    Regards,

    Content farmer

  • Content Farm God

    Lol, if I had known that you used Gravatar I wouldn’t have attempted to remain anonyomous.

  • Pingback: Google Panda / Google Farmer – what the UK SEO Manager needs to know « Indium Web Management Blog |Online Marketing News, Tips & Advice

  • http://moneymakinginarecession.blogspot.com/ Rose

    Well, given that every single farmer site that has been affected has announced (and is enforcing) improvements to their article quality standards, we shall see whether it really was readability that was the issue.

    My feeling is that G simply put more emphasis on external links, and many of these farms, particularly the article directories, don’t have many deep links to the articles themselves. If it’s lack of links that’s the issue, it’s going to take them forever to correct, especially as some like Ezine have millions of pages.

  • http://excerptz.com ryankett

    Over at Hubpages, where I frequently contribute, many of us suspect that Google has devalued the internal links. So your suggestion does have some credibility Rose. These content farms actually have a large number of links, but pages brought eachother up passing each other the link juice.

    The theory that a few of us subscribe to is that Google has devalued Internal linking, possibly manually, and that as a result each page is now standing on its own. In addition, it looks like we have suffered as a result of using ‘news’ capsules, and internal rss capsules, as the sitewide duplicate content ration must be high as a result. Thus, many of us are frantically removing RSS feeds.

    To be absolutely honest, “content farms” (I prefer to call them collaborative platforms) needed a kick up the backside. There was a lot of crap on all of them. Expect to see frantic cleaning, stricter regulations, and a steady rise back up the SERPS. I do half suspect something a little more sinister, pages with affiliate links are most affected.

    But, ultimately, these places have highly skilled search engine optimizers at the top who will be analysing traffic patterns and figuring out precisely what needs to be tweaked to exploit the search engine algo.

    But, like I said, content farms needed a good kick up the backside. I do believe that there is a place on the net for them though.

  • http://excerptz.com ryankett

    *I mean RATIO of course.

  • http://www.contentforconverions.com Russell Barnstein

    I examined all three of the articles links included in your post, and I must say that the eHow version is by far the most difficult to read. While the column width of EzineArticles and ArticlesBase could be smaller for faster absorption, the continuity of the text is very readable when compared to the eHow format. In fact, readability is often reduced by the addition of pictures, graphics, sub heading and sub-sub headings. While all of these do help to tell the search engines about your content, they only serve to hurt readability from a human standpoint.

    People searching for high-quality content won’t want it to be broken up and scattered as if being presented in different article segments. Continuity is the key in order to get readers to consume your entire work and then click through at the end. (Or before the end, if you’re placing your calls to action correctly.)

    I have experimented over the years and found the format for readability (when quality of content is being sought) to be simple: mid-narrow columns and no pictures or graphics. Check out one example of a site that is optimized for the reader in this way: http://www.contentforconversions.com

    Excellent article otherwise, information that I am using as part of an overall plan to address changes to marketing strategies as required by the Panda/Farm update. Thanks for sharing; I’ll be back.

    Russ Barnstein

  • http://www.seoworkers.com John S. Britsios (aka Webnauts)

    @Russell Barnstein I had a look at the site you recommended and to be honest the blog posts are a bunch of blocks of texts.

    Who ever wrote them, most probably have never heard the terms “Web Content Accessibility” and “Readability”.

    That said, the blog posts have not been written by a professional copywriter.

    Maybe the topics are good, but I did not even begin reading them when I saw what I saw.

  • http://www.contentforconverions.com Russell Barnstein

    “That said, the blog posts have not been written by a professional copywriter.”

    I’m sorry, I don’t understand this comment. I wrote all of the posts on that site. I am a professional writer and editor and have been making a living solely on various types of writing for the internet for a decade- and have been very successful, thank you.

    The site that I referenced is not a site that is supposed to contain “copy.” It’s a website, so what’s there is called content, not copy. Copywriting is different.

    Perhaps you are unfamiliar with the term copywriting?

    And apparently you don’t agree that any of the samples that the author of this blog or the sample I presented offer good readability. Would you be so kind as to actually show us a good example then? Offering something useful to this conversation would probably be a wise idea, instead of just negative comments issued in a fairly condescending tone.

    Put your money where your big mouth is.

  • http://www.tri-win.com Michael from Tri-win

    @John S 1) English text should be in a left to right block format its the foundation of how we learn to read and write we like to call them paragraphs.
    2) There is nothing wrong with working with a professional writer, I am a graphic designer and more often than not I wish I had the budget to bring one in for my blog.

    @Russell The only legibility criticism I would offer for the example you presented is your margin space is huge. There should be less distance white space on either side of the text to reduce the amount of scrolling needed. Traditionally a paragraph should be 1.5 – 2 “alphabets” in width before it breaks to the next line. Here is an example of my opinion of a more readable blog http://www.tri-win.com/blog/

  • http://www.contentforconverions.com Russell Barnstein

    @Michael;

    Thank you for the constructive criticism- I appreciate it and you are right. I’ve been experimenting with different themes in an effort to gain better “user friendliness,” as the site in question is highly optimized but gets hit for user friendliness. Originally I did not want ANYTHING on either side of the actual posts themselves, which is why you see such large margins there.

    However, I think you’re right and those margins are too big, too boring. I’d like to think that my services only attract the highest quality user where the reader doesn’t get distracted by other things on the page or lack thereof, but the fact is that people are fickle and they must be catered to.

    I took your suggestion and the suggestion of some others that came post-Google Panda update and made drastic changes to the presentation of content on that site. If you’d like you can see what we’ve done: http://www.contentforconversions.com

    Being that we’re all about high quality content, I need a theme that focuses on content without losing the reader, appears authoritative and clean without too many graphics and distractions. Hopefully we found it in the new theme above.

    I looked at your blog and I REALLY like it. It’s very clean and text-oriented, which I appreciate because it keeps the focus where it should be. Nice job. Now, as I writer I don’t mind this because I understand your thinking, however, the only issue about your blog that I would point out is that your paragraphs are HUGE! ;-)

    My paragraphs were very large like that at one time, but when I shortened them up I realized a distinct increase in the level of conversions I achieved.

    Now don’t get me wrong- writing is writing and if you’re focused on one single topic in a paragraph, then it shouldn’t matter how long that paragraph is. However, for web users it really does matter, which is unfortunate. Lately it seems like I have simply been going back in during editing and splitting paragraphs into 2 pieces in order to appease the ADHD of most web users. You might want to do the same- just a thought.

    Thanks for the comments, they are well-appreciated and are always put to use when valid, as with your suggestion.

    -Russ

  • http://www.tri-win.com Michael from Tri-Win

    @Russell
    The new layout it great. Clean easy to read, you made good changes. As for my blog I agree not only are my paragraphs long but the text is big. Combine the problem with my writing style, and the octogenarian optimized format and there is a severe need to scroll when reading my blog. I inherited the blog from the previous graphic designer and just never took the time to change the font size. Based on your advice I will need to make it a priority. Thanks.

  • http://pakistan66.blogspot.com Ahsan

    waiting to see what impact it will have when launch in other countries.

  • http://www.seoworkers.com John S. Britsios (aka Webnauts)

    Ahsan, if I do not recall, Oanda already launched in UK.

  • http://www.seoworkers.com John S. Britsios (aka Webnauts)

    Well, once we are talking about quality content writing (readbility,etc) I would like to share with you my article which was published yesterday: http://www.searchmarketingstandard.com/avoiding-the-perils-and-pitfalls-of-online-copywriting

  • http://www.tri-win.com Michael from Tri-Win

    John S –
    Thanks for the article. I am not a professional writer and that helps with some of the questions I had about blog writing. Plus I appropriate getting that information in a single place for quick reference.

  • http://www.chinatraveldiscovery.com china travel discovery

    I think google may also make some mistake on low quality content and link. If you don’t get back links from low quality sites, it will be ok.

  • http://www.kyrenedesigns.com Katherine

    My site has been affected too and I’m not a content farm.
    I know there is more to do on my site and my blog to start up etc but it’s gutting to see others now ahead with poor sites and very few links. Something is very odd. But what is even odder is that the top few sites have not changed their position even if they still have ‘cheap’ links
    Back to the drawing board.

  • http://www.mohdshakir.net Mohd Shakir Zakaria

    My site (http://www.techrecipes.net) has been getting steady traffic increase since the past few months, but the traffic dropped significantly after April 12. I don’t know what caused it, but traffic starting to turn normal since yesterday.

    http://www.mohdshakir.net/2011/04/23/are-your-websites-affected-by-googles-panda-update

  • Pingback: Improve the SEO around your personal brand | New Perspectives

  • http://www.refresh-it.net Refresh IT

    Great work by Google yet again, I’ve found the quality of most articles in these directories are shocking, especially on ezinearticles.

  • John

    Odd thing about your “how to have a baby boy example” is that the two articles you list as “harder to read” actually rank higher in Google than the one that had better readability for the term: “How to Have a Baby Boy”

    Do the search, check it yourself

  • wardycatt

    The two articles that are supposedly less ‘readable’ are actually far superior to the eHow article. It’s actually hard to read the article won eHow, considering all the adverts that are in the way – this is exactly what Google should be trying to eliminate – sites that are so bloated with adverts that it’s hard to see where the content begins.

    Ads shouldn’t be a major problem for users – after all, many sites freely provide info off the back of paid advertising. However, when the site actually begins to buckle under the weight of ads, then it’s a problem. Even very popular sites such as eBay have compromised loading speed etc. in a bid to squeeze in more adverts. People are generally sick of them. We don’t care about “that one weird rule that’ll help you lose 20lbs in a day etc. etc. etc. ad nauseum.

  • http://www.frivtown.com friv

    Seems that Panda is good for some webmasters and bad for another.Anyway..maybe this algorithm needs few updates for a better work.