Yesterday colleagues of mine at MIT were brainstorming plenaries for an upcoming media conference. Data visualization came up, but each of us grumbled. "Overdone," one of us said, to nodding heads. We'd done a session on that at every one of our conferences and forums, as had others at theirs. Data visualization had become tragically hip, as if we were in charge of a music festival and one of us had just proffered Coldplay.
But as we teased out our reservations, we realized that it wasn't visualization that we had an issue with; yes, we agreed, it's an overdone topic, but it's still incredibly useful. Rather, data was the problem. Despite the great leaps made in representing information, we were disappointed in the relatively teeny steps taken to explain how that information is collected, organized, and verified. (In the meeting, I half-jokingly suggested adding a credit card company executive to a plenary, given their companies' dependence on accurate data. It wasn't dismissed out of hand.)
The key issue, it turns out, is transparency throughout the entire data-collecting process (something we wouldn't expect to get from a credit card exec). Matt Hockenberry and Leo Bonanni here at MIT's Center for Future Civic Media try to address this issue with their project Sourcemap.
While pitched as a way to create and visualize "open supply chains," Sourcemap's real virtue is that the data itself is fully sourced. Like the links at the bottom of a Wikipedia article and the accompanying edit history, you know exactly who added the data and where that data came from. You can take that data and make counter-visualizations if you feel the data isn't correctly represented. Sourcemap's very structure acknowledges that visualization is an editorial process and gives others a chance to work with the original data. For example, here's an example of a Sourcemap for an Ikea bed:
In another example, the Washington Post yesterday published a piece about food fraud -- food whose contents or origins are misrepresented. You could use Sourcemap to out companies that lie about their food products. But using the same data, a food producer could use Sourcemap to show how consumer prices are lowered by using certain substitute ingredients and not others. The same goes for visualizations of campaign contributions, federal spending on hospitals, rural broadband penetration, your mayor's ability to get potholes filled, etc. The key, though, isn't necessarily good visualization but good data.
I don't want to understate the genius of good visualizations. But as they say: garbage in, garbage out. Without well collected, well organized, transparent data, you never know if you're looking at a mountain of trash.