Unstructured data is typically said to account for up to 80 percent of information stored on business computer systems. While this is a widely accepted notion, I’m inclined to agree with Seth Grimes that this 80 percent rule is inflated, depending on the type of business. Still, If we could structure even a fraction of that data, it would create significant value for small newspapers.
The type of data that has my attention is free-form text. Small newspapers in particular have computers full of text files containing information about their communities. Often, these files lie dormant, left on the hard drive of a dusty computer somewhere in the back of the newsroom, inaccessible to the public. Compounding this problem is the fact that newspapers realize no additional value from content they paid journalists to produce. The information is gathered, and then much of it sits somewhere, unused and untouched. Only parts of it end up being published.
To further understand the potential of resurrecting unstructured data, one must realize the workflow of traditional small newspapers.
It surprised me several years ago when I learned learn that most community newspapers utilize a very low-tech workflow when managing their data. A typical newspaper might organize their content in hierarchical folders as shown in the example below. Files are grouped by month, then named with the day of publication:
The workflow is simple, effective and has served its purpose for many years. Once a file’s publication date has passed, it is ignored forever. At best, a selection of these files are copy and pasted into a content management system for publication online. But this process seldom happens until after the newspaper’s print edition has been completed. At this point the newspaper has little incentive to process these files further, as attention must now be focused on the next day’s edition.
This reality helps illustrate the potential for the CMS Upload Utility, my Knight News Challenge project. It’s an inexpensive way to move text files into a web-accessible database. Once inside a database, possibilities abound for how value can be created from this data. In my next post, I’ll share several sample use cases to help explain how the application works.
For now, though, think about all of that unstructured data, and how we can make better use of it.