Technology »

Underwritten by John S. and James L. Knight Foundation

Idea Lab is a group blog by innovators who are reinventing community news for the Digital Age.

Read more about Idea Lab »

  • Check out Idea Lab Sponsorship opportunities!

  • Follow us on Twitter »
  • Each Idea Lab blogger is a winner of the Knight News Challenge grant to reshape community news.

    Learn more about the Knight News Challenge »

    Love PANDA? Try csvkit to Standardize Your Data

    Knight 2011 News Challenge Winner

    Now that PANDA has been out in the world for a while, we would like to suggest you also take a look at a "cousin" project of sorts: csvkit. csvkit is a suite of powertools for working with Comma-separated values (CSV) files. It's a command line toolkit, so you will need some minimal technical know-how to get started, but it also has extensive documentation. Those new to such tools will find the tutorial particularly useful!

    PANDA_reasonably_small.jpg

    The PANDA project aims to make basic data analysis quick and easy for news organizations, and make data sharing simple. Both PANDA and csvkit have at their core the belief that simple data formats are the best data formats. In PANDA's case this means that, although we import from CSV or Excel files, we only ever export to simple CSV files. This rewards us with the highest degree of interoperability with other software packages. Almost anything can read a CSV file. csvkit builds on the simplicity of CSV files by allowing you to do all sorts of useful things with them.

    WHAT CAN I DO WITH CSVKIT?
    Here are some of the things that you can do with csvkit:

    • Convert Excel, DBF, fixed-width and JSON files into CSV with in2csv.
    • Filter a CSV down to a subset of columns with csvcut.
    • Search and filter rows of a CSV with csvgrep.
    • Perform SQL-like "joins" between CSV files with csvjoin.
    • Convert a CSV to JSON or GeoJSON with csvjson.
    • Import a CSV straight into a database with csvsql.
    • Generate summary statistics for a CSV with csvstat.
    • And a whole lot of other useful things!

    PANDA and csvkit are a natural pairing of tools. Use csvkit to clean up or standardize your data before importing it into PANDA. Have a DBF or fixed-width file you can't import into PANDA? Use csvkit to convert it into a CSV. Need to use data from your PANDA in your web app? Export it to CSV and use csvjson to convert it to JSON. Using csvkit with PANDA will make your newsroom data even more useful.

    csvkit works on Linux, OSX and Windows with Python versions 2.6 and 2.7 or with PyPy. Head over to the documentation and get started with faster, better data processing.

    Christopher Groskopf is the lead developer on PANDA Project and a former developer on the Chicago Tribune's News Applications Team. He is also the creator of django-boundaryservice, csvkit, and Hack Tyler. His residence is in flux, but you can find him on Twitter regardless of his present whereabouts: @onyxfish.

    Further Reading

    Rate this entry

    • Currently 0/5
    • 1
    • 2
    • 3
    • 4
    • 5

    Rating: 0/5 (0 votes cast)

    Check out MediaShift Sponsorship opportunities!

    Featured Comment

    I think newspapers, blogs, and magazines should all be doing audio versions. I grew up enjoying and listening to audiobooks and now I don't have the same option for the short form content that I prefer to consume.

    Will Mayo
    Do Touch That Dial: Turn Your Newspaper Into a Radio Station

    Newsletters

    MediaShift delivers the best news on media and technology directly to your in-box.

    Monthly Archives