Far be it for me to question the brilliance of Google, but in the case of its new news meta tagging scheme, I’m struggling to work out why it is brilliant or how it will be successful.

First, we should applaud the sentiment. Most of us would agree that it is a Good Thing that we should be able to distinguish between syndicated and non-syndicated content, and that we should be able to link back to original sources. So it is important to recognize that both of these are — in theory — important steps forward both from the perspective of news and the public.

But there are a number of problems with the meta tag scheme that Google proposes.

Problems With Google’s Approach

Meta tags are clunky and likely to be gamed. They are clunky because they cover the whole page, not just the article. As such, if the page contains more than one article or, more likely, contains lots of other content besides the article (e.g. links, promos, ads), the meta tag will not distinguish between them. More important is that meta tags are, traditionally, what many people have used to game the web. Put in lots of meta tags about your content, the theory goes, and you will get bumped up the search engine results. Rather than address this problem, the new Google system is likely to make it worse, since there will be assumed to be a material value to adding the “original source” meta tag.

Though there is a clear value in being able to identify sources, distinguishing between an “original source” as opposed to a source is fraught with complications. This is something that those of us working on hNews, a microformat for news, have found when talking with news organizations. For example, if a journalist attends a press conference then writes up that press conference, is that the original source? Or is it the press release from the conference with a transcript of what was said? Or is it the report written by another journalist in the room published the following day? Google appears to suggest they could all be “original sources”; if this extends too far then it is hard to see what use it is.

Even when there is an obvious original source, like a scientific paper, news organizations rarely link back to it (even though it’s easy to use a hyperlink). The BBC — which is generally more willing to source than most — has historically tended to link to the front page of a scientific publication or website rather than to the scientific paper itself (something the Corporation has sought to address in its more recent editorial guidelines). It is not even clear, in the Google meta-tagging scheme, whether a scientific paper is an original source, or the news article based on it is an original source.

And what about original additions to existing news stories? As Tom Krazit wrote on CNET:

The notion of ‘original source’ doesn’t take into account incremental advances in news reporting, such as when one publication advances a story originally broken by another publication with new important details. In other words, if one publication broke the news of Prince William’s engagement while another (hypothetically) later revealed exactly how he proposed, who is the “original source” for stories related to “Prince William engagement,” a hot search term on Google today?

Differences with hNews

Something else Google’s scheme does not acknowledge is that there are already methodologies out there that do much of what it is proposing, and are in widespread use (ironic given Google’s blog post title “Credit where credit is due”). For example, our News Challenge-funded project, hNews already addresses the question of syndicated/non-syndicated, and in a much simpler and more effective way. Google’s meta tags do not clash with hNews (both conventions can be used together), but neither do they build on its elements or work in concert with them.

One of the key elements of hNews is “source-org” or the source organization from which the article came. Not only does this go part-way toward the “original source” second tag Google suggests, it also cleverly avoids the difficult question of how to credit a news article that may be based on wire copy but has been adapted since — a frequent occurence in journalism. The Google syndication method does not capture this important difference. hNews is also already the standard used by the largest American syndicator of content, the Associated Press, and is also used by more than 500 professional U.S. news organizations.

It’s also not clear if Google has thought about how this will fit into the workflow of journalists. Every journalist we spoke to when developing hNews said they did not want to have to do things that would add time and effort to what they already do to gather, write up, edit and publish a story. It was partly for this reason that hNews was made easy to integrate into publishing systems; it’s also why hNews marks information up automatically.

Finally, the new Google tags only give certain aspects of credit. They give credit to the news agency and the original source but not to the author, or to when the piece was first published, or how it was changed and updated. As such, they are a poor cousin to methodologies like hNews and linked data/RDFa.

Ways to Improve

In theory Google’s initiative could be, as this post started by saying, a good thing. But there are a number of things Google should do if it is serious about encouraging better sourcing and wants to create a system that works and is sustainable. It should:

  • Work out how to link its scheme to existing methodologies — not just hNews but linked data and other meta tagging methods.
  • Start a dialogue with news organizations about sourcing information in a more consistent and helpful way.
  • Clarify what it means by original source and how it will deal with different types of sources.
  • Explain how it will prevent its meta-tagging system from being misused such that the term “original source” becomes useless.
  • Use its enormous power to encourage news organizations to include sources, authors, etc. by ranking properly marked-up news items over plain-text ones.

It is not clear whether the Google scheme — as currently designed — is more focused on helping Google with some of its own problems sorting news or with nurturing a broader ecology of good practice.

One cheer for intention, none yet for collaboration or execution.