Jonathan Stray

Underwritten by John S. and James L. Knight Foundation

Idea Lab is a group blog by innovators who are reinventing community news for the Digital Age.

Read more about Idea Lab »

  • Check out Idea Lab Sponsorship opportunities!

  • Follow us on Twitter »
  • Each Idea Lab blogger is a winner of the Knight News Challenge grant to reshape community news.

    Learn more about the Knight News Challenge »
    Jonathan Stray

    How the AP's Overview Turns Documents Into Pictures

    Overview produces intricate visualizations of large document sets -- beautiful, but what do they mean? These visualizations are saying something about the documents, which you can interpret if you know a little about how they're plotted. Same documents, different visualizations There are two visualizations in the current prototype version of Overview, and both are based on document clustering. The first is the items plot, which grew out of the proof-of-concept system we presented a year ago. Every document is a dot. Similar documents get pulled together to form visible groups, that is, clusters. All the dots start grey, but become...

    more »

    Jonathan Stray

    How Overview Visualized 4,500 Pages of Declassified Iraq War Documents

    Reporting on an incident where private security contractors fired at civilians in Iraq is one thing, but reporting on all such incidents is something else entirely. That's the situation we were faced with when, in reporting on the role of private security firms in Iraq, we wanted to analyze 4,500 pages of recently declassified material -- the raw reports generated every time a security contractor working for the U.S. Department of State fired a weapon in Iraq, from 2005 to 2007. There was more material here than we could possibly read on deadline, so we used our prototype Overview document-mining...

    more »

    Jonathan Stray

    The Top 10 Data-Mining Links of 2011

    Overview is a project to create an open-source document-mining system for investigative journalists and other curious people. We've written before about the goals of the project, and we're developing some new technology, but mostly we're stealing it from other fields. The following are some of the best ideas we saw in 2011, the data-mining work that we found most inspirational. Many of these links are educational resources for learning about specific technology. Some of this work illuminates how algorithms and humans treat information differently. Other are just amazing, mind-bending work. 1. What do your connections say about you? A lot....

    more »

    Jonathan Stray

    3 Difficult Document-Mining Problems that Overview Wants to Solve

    The Overview project is an attempt to create a general-purpose document set exploration system for journalists. But that's a pretty vague description. To focus the project, it's important to have a set of test cases -- real-world problems that we can use to evaluate our developing system. In many ways, the test cases define the problem. They give us concrete goals, and a way to understand how well or poorly we are achieving those goals. These tests should be diverse enough to be representative of the problems that journalists face when reporting on document sets, and challenging enough to push...

    more »

    Jonathan Stray

    AP's Overview Will Try to Make Sense of Mountains of Data

    Over the last year, my colleagues and I at The Associated Press have been exploring visualizations of very large collections of documents. We're trying to solve a pressing problem: We have far more text than hours to read it. Sometimes a single Freedom of Information request will produce a thousand pages, to say nothing of the increasingly common WikiLeaks-sized dumps of hundreds of thousands of documents, or huge databases of public documents. Because reading every word is impossible, a large data set is only as good as the tools we use to access it. Search can help us find what...

    more »

    Check out MediaShift Sponsorship opportunities!

    Featured Comment

    Everyone is in the same boat when it comes to transitioning journalism to the digital age. What I know for sure is that the transition will happen with or without the people who work in the legacy industry.

    Dan Pacheco
    How 'Screenularity' Will Destroy Television as We Know It

    Newsletters

    MediaShift delivers the best news on media and technology directly to your in-box.

    Monthly Archives