By 6 years ago in Google

Forensic SEO Analysis Of United Airlines Google vs Chicago Tribune Story

Investors in United Airlines saw their stock value drop sharply when Google News published a Chicago Tribune / Sun Sentinel story announcing the company was filing for bankruptcy. The story was displayed by Google News as a new story even though it was actually published in 2002.

The Tribune is frantically trying to pass the blame onto Google and has released a press release which has been picked up by several news websites such as the LA Times and PC World. Google has released a blog post saying that they did nothing wrong:

On Saturday, September 6th at 10:36 PM Pacific Daylight Time (or Sunday, September 7th at 1:36 AM Eastern Daylight Time), the Google crawler detected a new link on the Florida Sun-Sentinel’s website in a section of the most viewed stories labeled “Popular Stories: Business.” The link had newly appeared in that section since the last time Google News’ Googlebot webcrawler had visited the page (nineteen minutes earlier), so the crawler followed the link and found an article titled “UAL Files for Bankruptcy.” The article failed to include a standard newspaper article dateline, but the Sun-Sentinel page had a fresh date above the article on the top of the page of “September 7, 2008″ (Eastern).

Because the Sun-Sentinel included a link to the story in its “Popular Stories” section, and provided a date on the article page of September 7, 2008, the Google News algorithm indexed it as a new story. We removed this story as soon as we were notified that it was posted in error.

The Tribune has this to say:

Our records show that due to traffic volume, sometime between 1:00:34 a.m. EDT, Sunday, September 7 (10:00:34 p.m. PDT, Saturday, September 6) and 1:36:03 a.m. EDT, Sunday, September 7 (10:36:03 p.m. PDT, Saturday, September 6), a link to the old article appeared in a dynamic portion of the Sun Sentinel’s business section, grouped with other stories under a tab entitled “Popular Stories Business: Most Viewed.” No new story was published and the old story was not re-published — a link to the old story was merely provided.

Importantly, the URL for the old story did not change when the link appeared on the website’s business section.

At 1:36:57 a.m. EDT, September 7, (10:36:57 p.m. PDT, September 6), our records show that the Google search agent — known as “Googlebot” — crawled the story on Sun Sentinel’s website. Our records also show that the Google search agent had previously crawled this same story numerous times, including as recently as last week. Shortly after Googlebot crawled the Sun Sentinel site this time, however, a link to the story appeared on Google News, with a date of Sept. 6, 2008, provided by Google. At 1:39:59 a.m. EDT, September 7 (10:39:59 p.m. PDT, September 6), our records show the story on the Sun Sentinel website received its first referral from Google News.

What actually happened

The Tribune has stated that the url of the story didn’t change but Google says the url was brand new and had not been crawled before so who is right? Looking at the search results below I could find the exact same article listed at 8 separate urls.

The link posted to the “Popular Stories” section was to this article (now removed) which had a different url to all the other stories indexed by Google so Google thought it was a brand new article. In Googles world if something has a brand new url then it’s a brand new page. Perhaps the url has existed for 6 years without being found by Google – if Google has indexed the story at a number of very similar urls then they may well have decided not to crawl the extra url. When this suddenly gets linked from thousands of other pages on the site Google probably thought it was important enough to crawl and index.

Why was this story listed in the popular stories section

The big mystery that nobody has picked up on is how a 6 year old story was listed in the “Popular Stories” section in the first place. Assuming it wasn’t a software glitch then my assumption is that either there was a sudden increase in search volume for a particular keyword or the story somehow started to gain traffic from social media.

If I was to heavily promote an old story (one that didn’t display the publishing date prominently) using Digg and StumbleUpon then it could quite easily get a lot of traffic and push it into the “Popular Stories” section of a major website.

According to the WSJ just one visitor was enough to push the story into the Popular Stories section:

In its latest explanation, Tribune said a single visit during a low-traffic period early Sunday morning pushed the undated story onto the list of most popular business news of its South Florida Sun-Sentinel newspaper’s Web site.

About 30 minutes after that visit, a user viewing a story about airline-cancellation policies during a storm-ravaged weekend clicked on the link for the old story.

Did the Tribune ask Google to stop crawling them?

According to newspaper reports the Tribune is claiming that they asked Google to stop crawling them, Google denies the claims so who is right?

Tribune said it asked Google “months ago” to stop using Googlebot to crawl its Web sites after it identified problems with the program. But Google denied such a request was ever made.

“The claim that the Tribune Company asked Google to stop crawling its newspaper Web sites is untrue,” it said.

Looking at the robots.txt file, which is the method every other website uses to prevent or control Googles spider we have to conclude that the Tribune didn’t ask Google to stop crawling them. Perhaps they sent an email or gave them a call, clearly whatever they did wasn’t effective.

Could the Tribune face legal action?

If my analysis is correct and the Tribune did publish the story at multiple urls and the stories didn’t have the correct datestamp then my belief is they have been negligent. In the era of social media where 100,000 people can be directed to an article within hours publishing without a valid time and date at the top of the article is highly irresponsible.

The fact that the CMS has been designed without considering the potential implications of multiple urls and how Google News might handle them is also highly irresponsible for a major media organisation.

Unless news sites realise the importance of these issues we will see this type of incident happen again and again.

By Patrick Altoft. at 11:00AM on Thursday, 11 Sep 2008

Patrick is the Director of Strategy at Branded3 and has spent the last 11 years working on the SEO strategies of some of the UK's largest brands. Patrick’s SEO knowledge and experience is highly regarded by many, and he’s regularly invited to speak at the world’s biggest search conferences and events. Follow Patrick Altoft on Twitter.

comments

7 Responses to “Forensic SEO Analysis Of United Airlines Google vs Chicago Tribune Story”

  1. Vinay says:

    One of the Best Article I have across at BlogStorm in recent time! Thumbs Up! Very very interesting analysis! Shows the Power of Digital media can bring in changes to Stock Value of a Company!

    Cheers!

  2. Shane says:

    Wow. I feel really bad for UAL and for whomever might face losing their jobs over not doing the most basic things to prevent it from happening. I hope this gets huge publicity, though, so that other organizations will panic and take steps now to be sure it never happens to them. Data backup and recovery was never much of a big deal until major catastrophes started wreaking havoc. Likewise, data security went largely ignored until huge blocks of sensitive consumer data were lost. And both of those took way more effort and capital expense than fixing this problem will.

  3. [...] a news story at multiple URLs without a proper publication date in the era of search engine optimization, or SEO, seems just a bit irresponsible for a major news [...]

  4. [...] 9/11/2008: BlogStorm has written a great post about what happened, and points out all of the duplicate content that might have caused the issue [...]

  5. Micky says:

    Great article and analysis. I have to say I’m amazed at how many ‘news’ articles can be found online on web pages without any indication of when the article was published. Worse still, if you proceed to read the story, it talks about ‘yesterday’ or ‘next week’ – and everyone reading it is left wondering whether this is current, or perhaps a few months old, or even a story from many years ago. It’s very frustrating, and it’s now been shown to be dangerous as well.

  6. [...] ander erg opmerkelijk nieuwtje deze week was het volledig instorten van de aandelen van United Airlines door een oud persbericht in de News Search.  Andre Scholten kwam nog met een mooie blogpost over [...]

  7. Kenneth says:

    So here’s the piece of this that confuses me: Investors acted on an incorrect piece of information they got from Bloomberg. Whatever the series of events leading up to it, it seems like Bloomberg is the company that flubbed here, much as they flubbed when they ran an obituary for Steve Jobs a few weeks ago.

    Regarding Tribune, it looks like they didn’t really take any action at all in this, and in any event they’re not responsible for anything that Google did with their content. Their system was probably designed pre-Google News, if not pre-Google. Tribune isn’t required to cater their systems to Google or anyone else, so I don’t understand why any blame would fall on them. No investor says they sold United Stock because they read the article in a Tribune paper, they sold because they read it in Bloomberg. That fact isn’t really in dispute.

    Google has a system that scrapes news, and everyone knows that no human being ever looks at it before it gets posted. It’s up to the journalists at Income Securities Advisors and Bloomberg to verify information before posting it. Failing to do that was essentially yelling fire in a crowded room, and neither Google nor Tribune caused them to do that.

    Am I missing anything here?

Leave a Reply