Tag: overview

by Jonathan Stray

There are some amazing algorithms coming out the computer science community which promise to revolutionize how journalists deal with large quantities of information. But building a tool that journalists can use to get stories done takes a lot more than algorithms. Closing this gap has been one of the most challenging and rewarding aspects of [...] more »

by Jonathan Stray

Before computers, all document-driven stories started with a big stack of paper. Often, the first task was to organize all that paper, by sorting individual documents into piles by type. This gives journalists a high-level idea of “what’s in there” and helps them decide what to read more closely — and just as importantly, what [...] more »

by Jarrel Wade

In May, I published a story which described how the Tulsa Police Department in Oklahoma purchased millions of dollars of under-powered and under-tested computer hardware, resulting in a multitude of problems. Emails showed complaints from the field in which officers were unable to get basic police information about dangerous calls when they were en route [...] more »

by Jonathan Stray

Overview produces intricate visualizations of large document sets — beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they’re plotted. Same documents, different visualizations There are two visualizations in the current prototype version of Overview, and both are based [...] more »

by Jonathan Stray

Reporting on an incident where private security contractors fired at civilians in Iraq is one thing, but reporting on all such incidents is something else entirely. That’s the situation we were faced with when, in reporting on the role of private security firms in Iraq, we wanted to analyze 4,500 pages of recently declassified material [...] more »

by Dan Sinker

I spent a rapid-fire 23 hours in St. Louis this weekend at the NICAR 12 conference. For those who don’t know, NICAR stands for “National Institute for Computer-Assisted Reporting,” and, as the slightly antiquated name might suggest, was founded long before the commercial Internet, back in 1989. Traditionally, the organization (which is run by IRE, [...] more »

by Jonathan Stray

Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We’ve written before about the goals of the project, and we’re developing some new technology, but mostly we’re stealing it from other fields. The following are some of the best ideas we saw in 2011, the data-mining work [...] more »

by Jonathan Stray

The Overview project is an attempt to create a general-purpose document set exploration system for journalists. But that’s a pretty vague description. To focus the project, it’s important to have a set of test cases — real-world problems that we can use to evaluate our developing system. In many ways, the test cases define the [...] more »

by Jonathan Stray

Over the last year, my colleagues and I at The Associated Press have been exploring visualizations of very large collections of documents. We’re trying to solve a pressing problem: We have far more text than hours to read it. Sometimes a single Freedom of Information request will produce a thousand pages, to say nothing of [...] more »