The first PANDA task officially checked off our to-do list was the drafting of our Future Users Survey. We distributed a link to the survey via Twitter, the NICAR-L mailing list and email. The PANDA project aims to make basic data analysis quick and easy for news organizations, and make data sharing simple. The survey covers a range of topics that we felt were crucial to understanding our future users, including the technical aptitude of the staff in their newsrooms, the quantity of data they work with, and possible barriers to using the software.
So far, we've had 77 responses to the survey. For the curious, we've put an anonymized summary of the results online.
Our future users
We had responses from many of the major newsrooms around the country. But we must hesitate before making any final conclusions, because it looks like we didn't reach too many smaller newsrooms. Keeping that in mind, several interesting statistics are in the data.
Most of the newsrooms surveyed indicated they are technically savvy, with 74 percent reporting they are likely to support running applications in-house. A slightly smaller group, 57 percent, are DocumentCloud users, a factor that we look at as an indicator of organizations that would be willing to adopt new newsroom tools.
One of the most striking things to emerge from the survey is the quantity of data our future users reported working with. Thirty-eight percent of users reported working with a single dataset that is in excess of 10 million rows. In the nearly two years I worked at the Chicago Tribune, we only saw a handful of datasets at this scale. Furthermore, 36 percent of users reported having a cumulative quantity of data in the range of hundreds of millions, or even billions, of rows. We've reached out to users who reported these "big data" numbers in order to get a better grasp of what sorts of data they are working with. The answer will inform our approach to an interesting design challenge: determining what scale we intend to support and how much time we will invest in documenting strategies for scaling beyond those initial limits.
Tools of the trade
The survey also inquired about the technology used within newsrooms in hopes of gaining an understanding of what tools are already in widespread use. A few quick hits from the results:
- 86 percent use at least one Google utility -- Docs, Fusion Tables or Refine.
- 86 percent reported using at least one SQL database.
- 75 percent use at least one programming language.
- 58 percent of newsrooms use Python, by far the most of any particular programming language. This bodes well for PANDA's ability to find a niche of power users and contributors.
We provided respondents the opportunity to sound off on what sorts of issues might prevent them from using PANDA as a hosted service, if that is what we decide to build. A large number indicated that they had security concerns about putting data online, and several stated outright that they would not, or could not, use a hosted service. We haven't made a decision about whether PANDA should be a hosted service, but these results will certainly guide our thoughts.
These statistics provide a clear, empirical picture of our audience. At ONA we will meet for a planning session, and this survey will factor heavily into the road map we build for the rest of the year. We will also be trying to interview more future users, and following up with some who replied to the survey. If you will be at ONA, please find one of us in the red PANDA shirts and let us know what PANDA can do to better serve your newsroom. It's also not too late to fill out the survey. If you haven't taken it, please take a few minutes to do so here.