I spent last Friday admiring the views of the Hudson from the 15th floor of the NY Times building, alongside Lisa Williams. Thought it was billed as a “hack day” there wasn’t much actual hacking going on that I could find. There was a steady stream of presenters, most of them funny, all of them plenty worth listening to. It was a day well spent, but not a day spent hacking.

Fair warning: I wasn’t trying to capture the essence of the day so much as taking notes that struck me as relevant to my own work, with an emphasis on questions I’ve been asking myself lately. I drifted off during the segments on the book and film review APIs, which, while neat-o and potentially of interest to lots of people, have little bearing on Gotham Gazette’s public policy coverage.

Despite a few unsuccessful attempts to inspire
participants to identify themselves, it was hard to get a real handle on who was in the
room. Male, mostly. 30ish, generally. It turned out I was sitting next to a designer from USA Today. There seemed to be a handful of other publications at the hackday, as well as people from data driven projects from a very cool dictionary effort (though with my collected business cards in another coat pocket and no attendee list in evidence, I can’t tell you which dictionary effort. Alphabetasomething.) to OpenCongress and Sunlight.

First up was Tim O’Reilly, who had some wise words about our own data as publishers. Or, about the NY Times’ data as a publisher. And about finding meaning in user generated content and looking for ways to turn that meaning into something useful. More precisely, turn that meaning into real-time user-facing services but that only makes sense if you’re going to sit through his whole talk. This is crowd sourcing, he’s talking about. It is inviting your readers to participate in reporting in a way that is meaningful. So the question is: Could we do a better job of anticipating how readers want to participate and giving them a framework for that?

O’Reilly had a lot of good questions about the business of publishing. The questions I’m asking now are less about reader participation than about who our readers are and what they’re doing:
Looking our “Most Emailed” stories: do we know more about who is emailing stories? Should people you’ve sent stories to makeup a social network? Are they your web? Can we tell a story about who is blogging articles? What articles are being blogged? We can answer that question on a purrely objective level, but we don’t aggregate or compare data about articles that are or aren’t being blogged. When a popular blog links to a Gotham Gazette story the post can get dozens of comments while our own forums sit quiet. We don’t have a good way to document or measure that conversation. I’m not as worried about eyeballs on ads as about being able to demonstrate that we’re making a positive contribution to civic conversations.

How much of that did Tim O’Reilly say? I wasn’t taking those kind of notes. I still think these are good questions to ask ourselves.

There were pieces of Tim’s talk that troubled me, too, though. Pieces that reflect some of my frustration with social media standards in general. He suggested that the TImes could do more to celebrate frequent commenters, but there are good reasons not to: you encourages people who want that attention to comment, but don’t necessarily foster conversations. Moreover, I’m not sure that focussing on the loudest voice encourages soft-spoken readers to say their piece. And I think that encouraging discussion should be at least partly about making sure that quieter voices get heard.

O’Reilly also asked for more “not again” features, more ways to ensure that you don’t see an article you’ve already read. Again: I’m not convinced this is the best way to foster civic engagement. Sure, when you’re reading the paper the first time you might want to skip straight to the newstuff, but I’d really like you to be able to come back to Gotham Gazette later in the day to read that story again or point it out to a friend. I think there is real value in things staying in the same place, at least for a little while.

This is my list of websites to look at again, gathered over the course of the day: MediaCzar’s maps of twitter tweeters, USA Spending.gov picking up what OMB Watch’s Fedspending.org has been doing for a while (with their blessing and their toolset). Stimulus Watch for more community budget monitoring and blprnt:http://blog.blprnt.com/ for gorgeous visualizations of NYT content based on data from their feeds. And some tools, Apture which I’d heard of but didn’t know how to spell, bit.ly which adds a lot of analytics to your shortend URLs, Semantalyzer for more structured keywords and OpenSearch for search result standardization, a concept I need to noodle a little more.

The whole point of the day was to get us all using the Times APIs but the ones I really really want are still under development. Like plans to extend the Campaign Finance API to New York State data or to open up the Represent to outside lookups. The latter, http://www.nypirg.org/ used to do, but they dropped their Who Represents Me project when maintaining the GIS servers finally overwhelmed them. They’d already spun off most of their mapping work, leaving Who Represents Me something like orphaned.

I wouldn’t say I live tweeted the event (secret: I loath live tweets of events, especially events I’m not at.) But I did my share of whispering from the back of the room.