OpenSpending has a built-in set of visualizations — bubble charts, treemaps, and tables — which are useful for exploring how data is structured in levels. None of them, however, are really suitable for representing spending flows.
Fortunately, users of the D3.js data visualization library have given us many examples of visualizations suitable for that purpose. The purpose of this tutorial is to show how easily D3.js can be used to visualize spending flows with OpenSpending data.
Introducing D3.js and Sankey diagrams
D3.js has a huge and active community of users, and they have built a set of example visualizations. Some of these are incredibly useful for catching the eye with money flows: Sankey diagrams, chord diagrams (or circular networks), and map networks.
|Energy and consumption
|Uber Rides by Neighborhood
|Flows of refugees
All of these examples are fully reusable: All you need to do to use them is to replace their underlying data with your own.
In the following example, we will focus on Sankey diagrams, as they can represent more than two levels of flow. Sankey diagrams are “typically used to visualize energy or material or cost transfers between processes … They’re helpful in locating dominant contributions to an overall flow.”
The Aggregate API
To get spending data into D3.js, we can use the OpenSpending API, which gives us spending data in a form that can easily be translated into something D3.js understands.
The key API for producing spending data visualizations is the “aggregate API,” which groups together entries in the dataset, sums up their values, and returns the result as a JSON object.
An aggregate API call looks like this, where “id” is the ID of an OpenSpending dataset:
If no other parameters are included, all entries in the dataset are put in a single group, and the values of every entry are summed together.
Things get more interesting when we add a “drilldown” parameter. This specifies a dimension of the data which will be used to split the set of entries. Each possible value of the specified dimension becomes a group of entries with its own subtotal.
The aggregate API returns an object with two fields, “drilldown” and “summary.” The latter contains information about the dataset, and the former is a list of different values of the drilled-down dimension and the sum of the spending values of all dataset entries with that value of the dimension. Each different value is an item in the drilldown, and its sum is its “amount.”
We can also split the dataset by combinations of dimensions. This API call gives us a subtotal for each combination of “programa” and “to”:
Using the aggregate API to construct D3.js visualizations means writing code to traverse the JSON objects returned by the API and to translate their contents into the form D3.js expects.
Building a Sankey diagram
Time for the full exercise! We will build a D3.js Sankey diagram from OpenSpending API, in the following way:
- Materials: 2013 income and spending budgets for the University of Granada (UGR) at Spain. These datasets are titled “ugr-income” and “ugr-spending” on OpenSpending.
- Methods: An R script that gets data from OpenSpending API and transforms it into a D3.js Sankey diagram JSON input file format.
- Results: A presentation page embedding the Sankey diagram, OpenSpending treemaps, and raw data.
The first step is to determine what we want to show in the Sankey diagram. Which relations should be displayed? How many levels of flow are appropriate for a suitable reading of the data? What’s the story that you want to tell?
Relying on the UGR income and spending budgets, we can imagine money flowing from the sources of income to the University and then the University spending this money. Attending to the budgetary structure, we finally choose a three-level Sankey diagram:
- Level 1: Income budget broken down as “articulo” (economic classification) targeting to “Universidad de Granada.”
- Level 2: “Universidad de Granada” targeting the spending budget broken down into “programas de gasto” (functional classification).
- Level 3: “Programas de gasto” broken down into “capítulos de gasto” (economic classification).
Notice that since the total amounts of the income and spending budgets are equal, both sides of the Sankey diagram have the same size.
The second step is being able to get the data. As we explained above, OpenSpending has an API that allows us to retrieve data aggregated by measures and drilled down by dimensions.
Getting the JSON data for the three levels of our Sankey diagram is as easy as follows:
The third step is to produce the JSON input file format for the D3.js Sankey diagram. It has two components: links and nodes. Nodes are joined with links (i.e., arrows with variable width) and are represented as an array of labels, while the links component refers to an array with three members: source node index, target node index, and value (in this example, amount of money). The indexes in the links component refer to the position of each node at the node’s component. Check the final JSON input file for this UGR example for further details.
So the data for Level 1 has income “articulo” labels as source, a hardcoded “Universidad de Granada” label for target, and amounts as value. Level 2 starts with a “Universidad de Granada” hardcoded label as source, spending “programa” labels as target, and amounts as value. For Level 3, we have spending “programa” labels as source, spending “chapter” labels as target, and amounts as value. The provided R script automates the process of retrieving the data and transforming it into a Sankey diagram JSON input file. The code’s comments clarify how it works.
We’ve shown how easy it is to take advantage of the aggregation methods of OpenSpending’s API to extend OpenSpending’s default set of visualizations. D3.js is a powerful toolkit that gives us a better comprehension of budgetary data. An out-of-the-box D3.js visualization using OpenSpending as a data warehouse would provide a nifty boost to the OpenSpending project. In the meantime, take a look at School of Data’s Michael Bauer’s openspending-sankey, which makes it rather easy to create D3.js Sankey diagrams for virtually every OpenSpending dataset.
J. Félix Ontañón Carmona, Computer Engineer and Digital Citizenship ICT consultant, is a member of OKFN-sp and co-founder of the citizen group OpenKratio (formerly Open Data Sevilla), both citizen organizations advocating for spreading the principles of #opengovernment, #openculture and #opendata at Spain. In these groups he has helped some 12 universities to open and visualize their economic data and participated with contributions to the future Transparency’s and Citizenship Participation’s regional laws for Andalusia. OpenKratio is the organization between Open Data Sevilla, an annual national summit about #opendata and #opengovernment. As a free software lover, he was involved with GNOME OpenSource Desktop Project and still maintains some projects as Wiican and Udev-discover.