A few months ago, my kids hit an inevitable, but still terrifying, milestone -- they began asking for a pet. Being a complete Scrooge, I quickly set to work explaining that pets are hard work and expensive. Showing a strong knack for journalism, they demanded proof of my assertions, so we set off to the pet store where my son quickly was ready to invest his birthday money in a small bird.
"Sure, you can buy the bird," I told him. "But what are going to feed it?"
With the launch of our OpenBlock project in North Carolina, rural newspapers from across the state have called or emailed to express their interest in getting our help installing and using the application. Installing the application isn't much of a challenge, I tell them, but what are you going to feed it?
OpenBlock, a "hyper-local news" platform, is a beast that eats data. So before we can make the Tar Heel State a good breeding ground for the application, we're setting out on a digital public records census. We aim to figure out how well city and county government agencies are living up to the recommendation of the Knight Commission on the Information Needs of Communities that "governments at all levels should ordinarily collect data electronically and in standardized formats."
Unlike many similar audits of public records that have been done by the Associated Press and others, this isn't some sort of exercise to see how governments comply with state and federal open records laws when they think they aren't being watched, so we're happy to describe here how we're going to go about gathering our data. In my dream world, the N.C. Association of County Commissioners sends out a link to this article to its members and implores them to help.
We're focusing our census on a few of the public records that rural newspaper publishers and editors have told us will be most valuable to their readers and advertisers -- births, deaths, land transactions, crime reports and health inspections.
Crime reports are particularly interesting. We know that people love police blotters, but also have real concerns about the safety of victims and the fairness of the criminal process. We know that state and federal agencies collect crime information in digital formats, but it's old and aggregated so it no longer has news value by the time it reaches that place in the information food chain.
To properly gauge the state of digital police records, we have to go to the city and county level. So our first step was to try to find or create a comprehensive list of every law enforcement agency in North Carolina that might generate incident or arrest reports. Thanks to a great report that a state agency submitted to the legislature earlier this year, we have the names of 569 police agencies.
From there, we're in the process of tracking down the website addresses of each agency to examine whether they publish incident and arrest reports there. (We will publish that list shortly, and may ask for your help filling in the blanks.)
Taking an 'Element' State of Mind
The bad news is that there's no indication we'll find a single agency that produces reports in a GeoRSS feed. The good news is that most police departments in the state appear to use a relatively standardized paper form to record police incidents and arrests.
In most cases, we can at least get those pieces of paper. But we've already run into cases in which police departments are unwilling to turn over standard incident reports without first heavily redacting them with misused citation of the state's open records law.
We're interested in the financial viability of OpenBlock, and paper records raise the cost. We'd have to pay people or recruit reliable volunteers to gather the paper records, scan them, and upload them to a DocumentCloud service that could use the layout of the page to extract editorially meaningful elements such as the date, time, location, and description of each document. That becomes almost impossible if we run into handwritten paper reports, which would force us to re-key the documents using local volunteers or perhaps something like Mechanical Turk.
For our census, it is not going to be enough to report that police records are online or offline, or that they are digital or not digital. We really need to be able to describe the format, location and timeliness of each data element. Taking a look at the website for the Winston-Salem Police Department, gives a good idea why we have to get more granular than the "documents state of mind" of traditional investigative reporters.
Winston-Salem publishes to its site what amounts to an index of incident and arrest reports. Each record includes the date, time, "type," case number, "primary offense" and "location." But for incident reports, it also links to a fuller record that provides information that's important for readers and reporters who want to determine the relative news value of each event -- data elements such as whether a weapon was used; the name, age, race and gender of the victim; whether drugs and alcohol were involved; whether anyone was injured; the amount of time the crime went unreported; and descriptions of the items that were stolen.
But missing from even those fuller records are data elements that would be useful for journalists who want to report trends and patterns rather than simple events. Some of the data is omitted with claims of too vague "information security purposes" and other data is omitted because of technical limitations of the departments' digital records management systems.
Each element brings with it a different cost of transforming it into a complete and current digital public record.
The variety of formats that our initial tests have already turned up seem to be limited. We've come across PDFs on the web, Word documents delivered daily via email, HTML tables, CSV file on the web, CD and via email, and the Mac-proof SNP filetype courtesy of Microsoft Access.
Most of these digital formats are created for police departments by one of three vendors that have a corner on 95 percent of the market. But we also know that 15 percent of the state's police agencies -- covering 1 percent of the population -- maintain no digital records.
police records play a key role
Police records are far from the most important -- and have proven throughout the history of this and other similar applications to be the hardest to get. But they play a key role in determining the viability of OpenBlock at rural papers. When compared to other interesting public records such as real estate or health inspections, there are simply more police reports that come out more often than other record types. Volume and frequency drive most common measures of audience engagement such as time on site and return visits.
OpenBlock is a hungry animal, and we've got to find a way to help rural papers feed it without going broke. That's the whole point of our census.
As we set off on our survey, we'll report findings and failings here. We're beginning to imagine some interesting things we'll be able to measure once we have a fuller picture of the state of records in North Carolina.
In the meantime, let me know what your experiences have been gathering digital public records at the state, county and city level. Share your experiences with @OpenRural Twitter and I'll re-tweet them. I've got lots to learn from you as well.