Now that PANDA has been out in the world for a while, we would like to suggest you also take a look at a "cousin" project of sorts: csvkit. csvkit is a suite of powertools for working with Comma-separated values (CSV) files. It's a command line toolkit, so you will need some minimal technical know-how to get started, but it also has extensive documentation. Those new to such tools will find the tutorial particularly useful!
The PANDA project aims to make basic data analysis quick and easy for news organizations, and make data sharing simple. Both PANDA and csvkit have at their core the belief that simple data formats are the best data formats. In PANDA's case this means that, although we import from CSV or Excel files, we only ever export to simple CSV files. This rewards us with the highest degree of interoperability with other software packages. Almost anything can read a CSV file. csvkit builds on the simplicity of CSV files by allowing you to do all sorts of useful things with them.
WHAT CAN I DO WITH CSVKIT?
Here are some of the things that you can do with csvkit:
- Convert Excel, DBF, fixed-width and JSON files into CSV with in2csv.
- Filter a CSV down to a subset of columns with csvcut.
- Search and filter rows of a CSV with csvgrep.
- Perform SQL-like "joins" between CSV files with csvjoin.
- Convert a CSV to JSON or GeoJSON with csvjson.
- Import a CSV straight into a database with csvsql.
- Generate summary statistics for a CSV with csvstat.
- And a whole lot of other useful things!
PANDA and csvkit are a natural pairing of tools. Use csvkit to clean up or standardize your data before importing it into PANDA. Have a DBF or fixed-width file you can't import into PANDA? Use csvkit to convert it into a CSV. Need to use data from your PANDA in your web app? Export it to CSV and use csvjson to convert it to JSON. Using csvkit with PANDA will make your newsroom data even more useful.
csvkit works on Linux, OSX and Windows with Python versions 2.6 and 2.7 or with PyPy. Head over to the documentation and get started with faster, better data processing.
Christopher Groskopf is the lead developer on PANDA Project and a former developer on the Chicago Tribune's News Applications Team. He is also the creator of django-boundaryservice, csvkit, and Hack Tyler. His residence is in flux, but you can find him on Twitter regardless of his present whereabouts: @onyxfish.