Technology »

  • Share
Underwritten by John S. and James L. Knight Foundation

Idea Lab is a group blog by innovators who are reinventing community news for the Digital Age.

Read more about Idea Lab »

  • Check out Idea Lab Sponsorship opportunities!

  • Follow us on Twitter »
  • Each Idea Lab blogger is a winner of the Knight News Challenge grant to reshape community news.

    Learn more about the Knight News Challenge »

    How Data Can Become an Evergreen Source for Newsrooms

    Knight 2011 News Challenge Winner

    Newsrooms don't fear too much news. They fear not enough news. With news on demand 24/7, the stream of information that journalists work with is becoming the commodity upon which they rely -- which is why "evergreen" stories are becoming a staple for the modern newsroom. What they need now are evergreen news sources.

    So how can data be an evergreen news source? Traditionally, data was hard to work with. It had to be collected, cleaned, organized, and once the effort was made to produce something consumable, it was left to stagnate and rot over time. With ScraperWiki, we've structured our site so that incoming data on the web renews your database and the infrastructure organizing your data flow does not rot.

    For use in the newsroom, however, the output needs to be streamed. Here a couple of things you can do:

    Data Stress to RSS

    infotribunal.jpg

    Our Web API now has an option to make RSS feeds as a format. For example, a ScraperWiki user made a scraper that gets alcohol licensing applications for Islington in London. She wanted an RSS feed to keep track of new applications using Google Reader. Now all she needs to do is go to the Web API explorer, choose "rss2″ for the format, and enter a SQL statement into the query box. That way, she gets only what she wants into her reader without having to change the database.

    The Early Data Bird Catches The Story

    Scrape_No10One of our savvy users then used ifttt to turn an RSS feed into a Twitter feed. For food safety inspections in Walsall, follow @EatSafeWalsall. In fact, we have a couple of accounts tweeting out scraped data. For ministers', permanent secretaries' and special advisers' meetings, gifts and hospitalities at No.10 Downing Street, follow @Scrape_No10. For Edinburgh planning applications, follow @PlanningAppMap. For complaints made against judges in the U.K., follow @OJCstatements.

    Because you can get data in the way you want, you can push data out the way you want and also keep the integrity of the original database. The sources of data for these accounts are very different, and the output scripts need to reflect the timing of the data release. However, all this work means sentences can be formed and hashtags attached. So if they start trending, you've got a story lead.

    A New Breed of Data Reporter

    I've been experimenting with data output from ScraperWiki. In fact, I've been talking to it. In preparation for our U.S. tour, I've created a new member of the virtual newsroom. So here's a little something I made earlier:

    It's not what you can do for your data, it's what your data can do for you!

    If you'd like to be a host or sponsor for a scraping event, email nicola[at]scraperwiki.com.

    Rate this entry

    • Currently 0/5
    • 1
    • 2
    • 3
    • 4
    • 5

    Rating: 0/5 (0 votes cast)

    Check out MediaShift Sponsorship opportunities! mediashift mixer collabspace promo.jpg

    Featured Comment

    I guess that combining the fixed rules for audio, video, image and text will be significant, as are the "open" intuitive based rules that the user contributes.

    jerry
    Zeega: Algorithm Isn't Just Another Word for Automation

    Monthly Archives