Tag: ap

by Jonathan Stray

There are some amazing algorithms coming out the computer science community which promise to revolutionize how journalists deal with large quantities of information. But building a tool that journalists can use to get stories done takes a lot more than algorithms. Closing this gap has been one of the most challenging and rewarding aspects of […] more »

by Jonathan Stray

Before computers, all document-driven stories started with a big stack of paper. Often, the first task was to organize all that paper, by sorting individual documents into piles by type. This gives journalists a high-level idea of “what’s in there” and helps them decide what to read more closely — and just as importantly, what […] more »

by Jonathan Stray

Reporting on an incident where private security contractors fired at civilians in Iraq is one thing, but reporting on all such incidents is something else entirely. That’s the situation we were faced with when, in reporting on the role of private security firms in Iraq, we wanted to analyze 4,500 pages of recently declassified material […] more »

by Jonathan Stray

Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We’ve written before about the goals of the project, and we’re developing some new technology, but mostly we’re stealing it from other fields. The following are some of the best ideas we saw in 2011, the data-mining work […] more »

by Jonathan Stray

The Overview project is an attempt to create a general-purpose document set exploration system for journalists. But that’s a pretty vague description. To focus the project, it’s important to have a set of test cases — real-world problems that we can use to evaluate our developing system. In many ways, the test cases define the […] more »

by Martin Moore

Ludwig Wittgenstein, poker lover The International Press Telecommunications Council (IPTC) has just launched rNews, a consistent, machine-readable way of expressing news metadata in RDFa (a linked data language). This post explains some of the differences between rNews and hNews and why, if you publish news on the web, you ought to be using one or the […] more »

by Martin Moore

Far be it for me to question the brilliance of Google, but in the case of its new news meta tagging scheme, I’m struggling to work out why it is brilliant or how it will be successful. First, we should applaud the sentiment. Most of us would agree that it is a Good Thing that […] more »

by Martin Moore

People in news don’t generally think of innovation as their job. It’s that old CP Snow thing of the two cultures, where innovation sits on the science not the arts side. I had my own experience of this at the American Society of Newspaper Editors conference in Washington a couple of months ago. After one […] more »

by Martin Moore

Defining principles of journalism is difficult. Rewarding, but difficult. Back in 2005 it took the Los Angeles Times a year of internal discussions to settle on its ethical guidelines for journalists. The Committee for Concerned Journalists took four years, did oodles of research and held 20 public forums, in order to come up with a […] more »

by Martin Moore

We are on the cusp of something exciting. Thousands of news articles marked up with with hNews, a microformat for news content funded by the Knight Foundation, will soon start populating the Internet. Last week, hNews became an official draft microformat. Having been proposed as a new data format and then discussed within the microformats […] more »